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Preface 


Instruction Set Architecture 


Synergistic Processor Unit 


The purpose of this document is to describe the Synergistic Processor Unit (SPU) Instruction Set Architecture 
(ISA) as it relates to the Cell Broadband Engine™ Architecture (CBEA). 


Who Should Read This Document 


This document is intended for designers who plan to develop products using the SPU ISA. Use this document 
in conjunction with the documents listed in Related Documents on page 13. 


Related Documents 


The following documents are reference materials for the SPU ISA. 
































Title Version Date 
Cell Broadband Engine Architecture 1.01 October 2006 
PowerPC User Instruction Set Architecture, Book | 2.02 January 2005 
PowerPC Virtual Environment Architecture, Book II 2.02 January 2005 
PowerPC Operating Environment Architecture, Book III 2.02 January 2005 
Document Organization 
Section Description 








Front Matter 


Title Page, Copyright and Disclaimer, Contents, List of Figures, List of 
Tables 





Preface 


Describes this document, related documents, responsibilities, and other 
general information 





Revision Log 


High-level list of changes from the last version to this version 





Section 1 Introduction on page 23 


Provides a high-level description of the SPU architecture and its purpose. 





Section 2 SPU Architectural Overview on page 25 


Provides an overview of the SPU architecture. 





Section 3 Memory—Load/Store Instructions on page 31 


Lists and describes the SPU load/store instructions. 





Section 4 Constant-Formation Instructions on page 49 


Lists and describes the SPU constant-formation instructions. 





Section 5 Integer and Logical Instructions on page 57 


Lists and describes the SPU integer and logical instructions. 





Section 6 Shift and Rotate Instructions on page 117 


Lists and describes the SPU shift and rotate instructions. 





Section 7 Compare, Branch, and Halt Instructions on 
page 149 


Lists and describes the SPU compare, branch, and halt instructions. 





Section 8 Hint-for-Branch Instructions on page 191 


Lists and describes the SPU hint-for-branch instruction. 





Section 9 Floating-Point Instructions on page 195 


Lists and describes the SPU floating-point instructions. 





Section 10 Control Instructions on page 237 


Lists and describes the SPU control instructions. 





Section 11 Channel Instructions on page 247 








Describes the instructions used to communicate between the SPU and 
external devices through the channel interfaces. 
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Section Description 








Section 12 SPU Interrupt Facility on page 251 Describes the SPU interrupt facility. 





Section 13 Synchronization and Ordering on page 253 Describes the SPU sequentially ordered programming model. 





Appendix A Instruction Table Sorted by Instruction Mne- 


monic on page 259 Lists the SPU instructions sorted by their mnemonics. 





Appendix B Details of the Generate Controls Instructions | Provides the details of the masks that are generated by the generate con- 
on page 265 trols instructions. 














Version Numbering 


The document version number appears on the title page and in the footer of every page. The format of the 
version number is V.xy, where: 


e Vis the major version level. This number is incremented when a new required feature is added to the 
architecture. The major and minor revision numbers are set to zero. For example, version 1.12 becomes 
version 2.00. 


* xis the major revision level. This number is incremented when a new, optional feature is added to the 
architecture or a major change is added that could affect a programmer. The minor revision level is set to 
zero. For example, version 1.12 becomes version 1.20. 


* yis the minor revision level. This number is incremented for every new release that does not contain any 
new required or optional features. For example, version 1.12 becomes version 1.13. 
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How to Use the Instruction Descriptions 


Figure i illustrates how to use the instruction descriptions provided in this document. 


Figure i. Format of an Instruction Description 


Instruction Name 





Required or Optional Version 


(Required (v 1 0) 








(Load Quadword (d-form) ) 


Instruction Operands 














Instruction Format 


Instruction Mnemonic rt,symbol(ra) 











Instruction OpCode 





(Binary) 


Instruction Description 





/ O O 1 1 0 1 0 0 Ho RA RT 
| l | ! L3 44 it f ! 
oa ROG REED ERE RC REO 25 26 27 28 29 30 31 








" local storage address is com puted by adding the signed value in the 110 field, with 4 zero bits append 
to the value in the preferred slot of register RA and forcing the rightmost 4 bits of the sum to zero. The 16 
bytes at the local storage address are placed into register RT. This instruction is computed usingthe followin 
ormula: 





Instruction 
Calculations 











LSA — (RepLeftBit(I10 II ObOO00,32) + RA) & LSLR & OxF FFFFFFO in 




















VL AT «— LocStor(LSA, 16) ] 
| 
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Conventions and Notations Used in This Manual 


Byte Ordering 


Throughout this document, standard IBM big-endian notation is used, meaning that bytes are numbered in 
ascending order from left to right. Big-endian and little-endian byte ordering are described in the Cell Broad- 
band Engine Architecture document 


Bit Ordering 


Bits are numbered in ascending order from left to right with bit O representing the most-significant bit (MSb) 
and bit 31 the least-significant bit (LSb). 





o|<— MSb 
9 «— LSb 


1.2[(83]4/.5 6|7/|8.9 10/|11/12/13 14|15|16/ 17 |18 |19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 








































































































Bit Encoding 


The notation for bit encoding is as follows: 


* Hexadecimal values are preceded by Ox. 
For example: 0x0A00. 


* Binary values are preceded by Ob. 
For example: 0b1010. 


Instructions, Mnemonics, and Operands 


This document follows the following conventions for instructions, mnemonics, and operands: 
* Instruction mnemonics are written in bold type. For example, sync for the synchronize instruction. 


e Each instruction description in this document indicates whether the instruction is optional or required and 
which version of the architecture introduced the instruction. The instruction description includes the mne- 
monic and a formatted list of operands as shown in Figure i on page 15. In addition, each instruction 
description provides a sample assembler language statement showing the format supported by the 
assembler. 


e Variables are written in italic type. 
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Referencing Registers or Channels, Fields, and Bit Ranges 


Registers and channels are referred to by their full name or by their mnemonic (also called the short name). 
Fields are referred to by their field name or by their bit position. 


Usually, the register mnemonic is followed by the field name or bit position enclosed in brackets. For 
example: MSR[R]. An equal sign followed by a value indicates the value to which the field is set; for example, 
MSR[R] = 0. When referencing a range of bit numbers, the starting and ending bit numbers are enclosed in 
brackets and separated by a colon; for example, [0:34]. 


The following table describes how registers, fields, and bit ranges are referred to in this document and 
provides examples of the references. 





Type of Reference Format Example 








Reference to a specific register and a 
specific field using the register short Register Short Name[Field Name] MSR[R] 
name and the field name 





Reference to a field using the 


field name [Field Name] [R] 





Reference to a specific register and to 
multiple fields using the register short — Register Short Name[Field Name1, Field Name2] MSR[FEO, FE1] 
name and the field names 


Reference to a specific register and to 
multiple fields using the register short | Register Short Name[Bit Number, Bit Number] MSR[52, 55] 
name and the bit positions. 








Reference to a specific register and to a | Register Short Name[Bit Number] MSR[52] 
field using the register short name and 
the bit position or the bit range. Register Short Name[Starting Bit Number:Ending Bit Number] MSR[39:44] 





A field name followed by an equal sign MSR[FEO0]-0b1 


Register. Short Name[Field Name]-n' 


(=) and a value indicates the value for MSR[FE]=0x1 
that field. 
; P MSR[52]=0b0 
zp 
Register_Short_Name[Bit_Number]=n MSR[52]=0x0 





Register_Short_Name[Starting_Bit_Number:Ending_Bit_NumberJ=n’ | MSR[39:43]-0b10010 
MSR([39:43]-0x11 




















1. Where n is the binary or hexadecimal value for the field or bits specified in the brackets. 





Version 1.2 Preface 
January 27, 2007 Page 17 of 278 


SONY 


Instruction Set Architecture €» SONY 


COMPUTER 
SER ARMAS a 


Synergistic Processor Unit 


Register Transfer Language Instruction Definitions 


This document generally follows the register transfer language (RTL) terminology and notation in the 
PowerPC? Architecture". 


RTL descriptions are provided for most instructions and are intended to clarify the verbal description, which is 
the primary definition. The following conventions apply to the RTL: 
e LocStor(x y) refers to the y bytes starting at local storage location x. 


* RepLeftBit(x,y) returns the value x with its leftmost bit replicated enough times to produce a total length 
of y. 


* The program counter (PC) contains the address of the instruction being executed when used as an oper- 
and, or the address of the next instruction when used as a target. 


e Temporary names used in the RTL descriptions have the widths shown in Table i. 


Table i. Temporary Names Used in the RTL and Their Widths 
































Temporary Name Width 
b, byte, byte1, byte2, c 8 bits 
ns 16 bits 
bbbb, EA, QA, t, tO, t1, t2, t3, u, v 32 bits 
Q, R, Memdata 128 bits 
Rconcat 256 bits 
i, j, K, m Meta (for description only) 
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The instructions in this document can contain one or more of the fields described in Table ii. 


Table ii. Instruction Fields 


















































Field Description 
Reserved field in an instruction. 
LMM Reserved fields that are currently not in use contain zeros even where this is not checked by the architec- 
ture; this allows for future use without causing incompatibility. 
I7 7-bit immediate 
18 8-bit immediate 
110 10-bit immediate 
116 16-bit immediate 
OP 
or Opcode 
OPCD 
RA[18-24] Field used to specify a general-purpose register (GPR) to be used as a source or as a target. 
RB[11-17] Field used to specify a GPR to be used as a source or as a target. 
RC[4-10] Field used to specify a GPR to be used as a source or as a target. 
RT[25-31] Field used to specify a GPR to be used as a target. 
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Instruction Operation Notations 


The instructions in this document use the notations described in Table iii. This table is ordered with respect to 
the order of precedence, where the first operator in the table binds most tightly. 


Table iii. Instruction Operation Notations 



















































































Notation Description Nd 
Xp Means bit p of register or value field X 
Xo:q Means bits p through q inclusive of register or value X 
xP Means byte p of register or value X 
xPa Means bytes p through q inclusive of register or value X 
Xp:q Means bits p and the bits that follow for a total of q bits 
xPq Means bytes p and the bytes that follow for a total of q bytes 
pO and p1 Mean a string of p 0 bits and of p 1 bits. 1 
A unary NOT operator 2 
* Signed multiplication, 1 
I*l Unsigned multiplication 
+ Two’s complement addition 2 
- Two's complement subtraction, unary minus 2 
= Equals 
* Not Equals relations 
<, S, >, > Signed comparison relations 
<u >u Unsigned comparison relations 
& AND 2 
| OR 2 
e Exclusive OR (a & —b | 2a & b) 2 
Assignment 
LSA Local Storage Address 
LSLR Local Storage Limit Register 
LocStor(LSA, width) Contents of the number of bytes indicated by the width variable in local storage at the LSA. 
Conditional execution. Else is optional. The range of the then and else clauses is indicated by 
if (cond) then ... else ... indention. When the clauses are single statements, they are shown on the same line as the corre- 
sponding if and else. 
for ... end For loop. To and by clauses specify incrementing an iteration variable, and a while clause pro- 
vides termination conditions. 
do ... while (cond) Do loop. While clause provides termination conditions. 
Reserved field in an instruction. 
/, 11, ill Reserved fields are presently unused and should contain zeros, even where this is not checked by 
the architecture, to allow for future use without causing incompatibility. 
1. This is different from the PowerPC notation, which uses a leading superscript rather than a subscript. 
2. The result of this operator is a bit vector of the same width as the input operands. 
3. The result of this operator is a bit vector of the width of the sum of the operand widths. 
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Revision Log 
Each release of this document supersedes all previously released versions. The revision log lists all signifi- 


cant changes made to the document since its initial release. In the rest of the document, change bars in the 
margin indicate that the adjacent text was significantly modified from the previous release of this document. 


Revision Date Contents of Modification Errata? 








Version 1.2 

* Revised the introduction to the revision log (see Revision Log on page 21). 

* Updated a figure to illustrate the revised instruction format (see Figure i Format of an Instruc- 
tion Description on page 15). Also, updated the description on instruction conventions (see 
Instructions, Mnemonics, and Operands on page 16). 

* Corrected and clarified the programming note associated with the Multiply High instructions 
and added a code sample (see Multiply High on page 77). 


* Deleted "nonzero" from the description of an IEEE noncompliant result (see Section 9.1 Single 
Precision (Extended-Range Mode) on page 195). 


e Indicated that an exponent field of all ones is reserved for Infinity as well as Not-a-Number 
(NaN) fields (see Table 9-3 Double-Precision (IEEE Mode) Minimum and Maximum Values on 
page 197). 


* Changed the description of handling denormal inputs (see Section 9.2.1 Conversions Between 
Single-Precision and Double-Precision Format on page 198). 


January 27, 2007 * Deleted "nonzero" from the description of FPSCR[31] (see Section 9.3 Floating-Point Status 


and Control Register on page 200). 


* Added five optional instructions (see Double Floating Compare Equal on page 226, Double 
Floating Compare Magnitude Equal on page 227, Double Floating Compare Greater Than on 
page 228, Double Floating Compare Magnitude Greater Than on page 229, and Double Float- 
ing Test Special Value on page 230). 

* Changed "coherency" to "consistency" in two places to conform to the terminology used in 
Table 13-2 Synchronization Instructions on page 255 (see Section 13.3 Synchronization Prim- 
itives on page 254). 

e Added the new instructions to Appendix A (see Table A-1 Instructions Sorted by Mnemonic on 
page 259). 

* Made various editorial changes to the glossary (see Glossary on page 267). 

* Revised the format of the instruction descriptions throughout. The instruction heading now 


indicates whether the instruction is optional or required and in which version of the architecture 
the instruction was introduced. 
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Corrected the expansion of the bisled instruction mnemonic (see Section 2 SPU Architectural 
Overview on page 25). 


Corrected the mnemonic for the Add Word instruction (see Multiply High on page 77). 


Revised the description of the Select Bits instruction (see page 115). Revised several pro- 
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Rotate and Mask Word on page 138, and Rotate and Mask Word Immediate on page 139). 


Explained the inline prefetch (see Hint for Branch (r-form) on page 192). 
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tion dependent. 
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for each slice can be controlled independently (see Section 9.2 on page 197). 


Expanded the explanation of how denormal inputs are handled (see Section 9.2.1 on 
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Corrected the description of the Inop instruction (see No Operation (Load) on page 240). 
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Move from Special-Purpose Register on page 244 and Move to Special-Purpose Register on 
page 245). 
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1. Introduction 


The purpose of the Synergistic Processor Unit (SPU) Instruction Set Architecture (ISA) document is to 
describe a processor architecture that can fill a void between general-purpose processors and special- 
purpose hardware. Whereas the objective of general-purpose processor architectures is to achieve the best 
average performance on a broad set of applications, and the objective of special-purpose hardware is to 
achieve the best performance on a single application, the purpose of the architecture described in this docu- 
ment is to achieve leadership performance on critical workloads for game, media, and broadband systems. 
The purpose of the Synergistic Processor Unit Instruction Set Architecture (SPU ISA) and the Cell Broadband 
Engine Architecture (CBEA) is to provide information that allows a high degree of control by expert (real-time) 
programmers while still maintaining ease of programming. 


The SPU has the following key workloads: 


* The graphics pipeline, which includes surface subdivision and rendering 
* Stream processing, which includes encoding, decoding, encryption, and decryption 
* Modeling, which includes game physics 


The implementations of the SPU ISA achieve better performance to cost ratios than general-purpose proces- 
sors because the SPU ISA implementations require approximately half the power and approximately half the 
chip area for equivalent performance. This is made possible by the key features of the architecture and imple- 
mentation listed in Table 1-1. 


Table 1-1. Key Features of the SPU ISA Architecture and Implementation (Page 1 of 2) 





Feature Description 





Many of the applications previously mentioned allow for single-instruction, multi- 
ple-data (SIMD) concurrency. In an SIMD architecture, the cost (area and power) 
of fetching and decoding instructions is amortized over the multiple data elements 
processed. A 128-bit (most commonly 4-way 32-bit) SIMD has commonality with 
SIMD processing units in other general-purpose processor architectures and the 
existing code base to support it. 


128-bit SIMD execution unit organization 





Whereas most processors reduce latency to memory by employing caches, the 
SPU in the CBEA implements a small local memory rather than a cache. This 
approach requires approximately half the area per byte and significantly less 
power per access, as compared to a cache hierarchy. In addition, it provides a 
Software-managed memory high degree of control for real-time programming. Because the latency and 
instruction overhead associated with direct memory access (DMA) transfers 
exceeds that of the latency of servicing a cache miss, this approach achieves an 
advantage only if the DMA transfer size is sufficiently large and is sufficiently pre- 
dictable (that is, DMA can be issued before data is needed). 





Load/store architecture to support efficient static ran- The SPU ISA microarchitecture is organized to enable efficient implementations 
dom access memory (SRAM) design that use single-ported (local storage) memory. 


The 128-entry register file in the SPU architecture allows for deeply pipelined 
high-frequency implementations without requiring register renaming to avoid reg- 
ister starvation. This is especially important when latencies are covered by soft- 
ware loop unrolling or other interleaving techniques. Rename hardware typically 
consumes a significant fraction of the area and power in modern high-frequency 
general-purpose processors. 


Large unified register file 





The SPU ISA defines compare instructions to set masks that can be used in three 
ISA support to eliminate branches operand select instructions to create efficient conditional assignments. Such con- 
ditional assignments can be used to avoid difficult-to-predict branches. 
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Table 1-1. Key Features of the SPU ISA Architecture and Implementation (Page 2 of 2) 





Feature 


Description 








ISA support to avoid branch penalties on predictable 
branches 


The SPU hint-for-branch instructions allow programs to avoid a penalty on taken 
branches when the branch can be predicted sufficiently early. This mechanism 
achieves an advantage over common branch prediction schemes in that it does 
not require storing history associated with previous branches and thus saves 
area and power. The ISA solves the problem associated with hint bits in the 
branch instructions themselves, where considerable look-ahead (branch scan) in 
the instruction stream is necessary to process branches early enough that their 
targets are available when needed. 





Graphics-oriented single-precision (extended-range) 
floating-point support 


Much of the code base for game applications assumes a single-precision floating- 
point format that is distinct from the IEEE 754 format commonly implemented on 
general-purpose processors. For details on the single-precision format, see 
Section 9 Floating-Point Instructions on page 195. 





Channel architecture 


Blocking channels for communication with the synergistic Memory Flow Control- 
ler (MFC) or other parts of the system external to the SPU, provide an efficient 
mechanism to wait for the completion of external events without polling or inter- 
rupts/wait loops, both of which burn power needlessly. 





User-only architecture 








The SPU does not include certain features common in general-purpose 
processors. Specifically, the processor does not support a supervisor mode. 
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2. SPU Architectural Overview 


This section provides an overview of the SPU architecture. 


The SPU architecture defines a set of 128 general-purpose registers (GPRs), each of which contains 128 
data bits. Registers are used to hold fixed-point and floating-point data. Instructions operate on the full width 
of the register, treating the register as multiple operands of the same format. 


The SPU supports halfword (16-bit) and word (32-bit) integers in signed format, and it provides limited 
support for 8-bit unsigned integers. The number representation is two's complement. 


The SPU supports single-precision (32-bit) and double-precision (64-bit) floating-point data in IEEE 754 
format. However, full single-precision IEEE 754 arithmetic is not implemented. 


The architecture does not use a condition register. Instead, comparison operations set results that are either 
0 (false) or 1 (true), and that are the same width as the operands being compared. These results can be used 
for bitwise masking, the select instruction, or conditional branches. 


The SPU loads and stores access a private memory called local storage. The SPU loads and stores transfer 
quadwords between GPRs and local storage. Implementations can feature varying local storage sizes; 
however, the local storage address space is limited to 4 GB. 


The SPU can send and receive data to external devices through the channel interface. SPU channel instruc- 
tions transfer quadwords (128 bits) between GPRs and the channel interface. Up to 128 channels are 
supported. Two channels are defined to access Save-and-Restore Register 0 (SRRO), which holds the 
address used by the Interrupt Return instruction (iret). The SPU also supports up to 128 special-purpose 
registers (SPRs). The Move To Special Purpose Register (mtspr) and Move From Special Purpose Register 
(mfspr) instructions move 128-bit data between GPRs and SPRs. 


The SPU also monitors a status signal called the external condition. The Branch Indirect and Set Link If 
External Data (bisled) instruction conditionally branches based upon the status of the external condition. The 
SPU interrupt facility can be configured to branch to an interrupt handler at address 0 when the external 
condition is true. 


2.1 Data Representation 


The architecture defines the following: 


* An 8-bit byte 

* A 16-bit halfword 

* A 32-bit word 

* A 64-bit doubleword 
* A 128-bit quadword 


Byte ordering defines how the bytes that make up halfwords, words, doublewords, and quadwords are 
ordered in memory. The SPU supports most-significant byte (MSB) ordering. With MSB ordering, also called 
big endian, the most-significant byte is located in the lowest addressed byte position in a storage unit (byte O). 
Instructions are described in this document as they appear in memory, with successively higher addressed 
bytes appearing toward the right. 


The conventions for bit and byte numbering within the various width storage units are shown in the figures 
listed in Table 2-1. 
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Table 2-1. Bit and Byte Numbering Figures 


SONY 


SONY 


COMPUTER 
SER ARMAS a 





For a figure that shows... 


See... 








Bit and Byte Numbering of Halfwords 


Figure 2-1 on page 26 





Bit and Byte Numbering of Words 


Figure 2-2 on page 26 





Bit and Byte Numbering of Doublewords 


Figure 2-3 on page 26 





Bit and Byte Numbering of Quadwords 


Figure 2-4 on page 27 








Register Layout of Data Types 





Figure 2-5 on page 28 








These conventions apply to integer and floating-point data (where the most-significant byte holds the sign 
and at a minimum the start of the exponent). The figures show byte numbers on the top and bit numbers 


below. 


Figure 2-1. Bit and Byte Numbering of Halfwords 








2 8 
= 1 
0 1 
Y vov 4 
0 1 2 3 4 5 6 7/8 9 10 11 12 13 14 15 














Figure 2-2. Bit and Byte Numbering of Words 








2 2 
= zi 
0 1 2 3 
Y vov Vox vov 4 
0 1 2 3 4 5 6 7/8 9 10 11 12 13 14 15/16 17 18 19 20 21 22 23|24 25 26 27 28 29 30 31 




















Figure 2-3. Bit and Byte Numbering of Doublewords 





























a 
e 
= 
0 1 2 3 
M vv vov vv Y 
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[75] 
= 
4 5 6 7 
v vx vov vv Y 
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Figure 2-4. Bit and Byte Numbering of Quadwords 
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2.2 Data Layout in Registers 


All GPRs are 128 bits wide. The leftmost word (bytes 0, 1, 2, and 3) of a register is called the preferred slot. 
When instructions use or produce scalar operands or addresses, the values are in the preferred slot. A set of 
store assist instructions is available to help store bytes, halfwords, words, and doublewords. Figure 2-5 illus- 
trates how these data types are laid out in a general purpose register (GPR). 


Figure 2-5. Register Layout of Data Types 


























Preferred Slot Byte Index 

0 1 2 8 | 4 5 6 7 8 9 10 11 12 13 14 15 
Registers BYTE 
HALFWORD 







































































2.3 Instruction Formats 


There are six basic instruction formats. These instructions are all 32 bits long. Minor variations of these 
formats are also used. Instructions in memory must be aligned on word boundaries. The instruction formats 
are shown in Figures 2-6 through 2-1 1. 

Note: The OP code field is presented throughout this document in binary format. 

Figure 2-6. RR Instruction Format 




























































































OP RB RA RT 
Y Vy vov v x 4 
0 10 | 11 17 |18 24 | 25 31 
Figure 2-7. RRR Instruction Format 
OP RT RB RA RC 
Y vox Vy vv vv Y 
0 3.4 10 | 11 17 |18 24 | 25 31 
Figure 2-8. RI7 Instruction Format 
OP 17 RA RT 
Y vv vov v x 4 
0 10 | 11 17 |18 24 |25 31 
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Figure 2-9. RI10 Instruction Format 
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OP 110 RA RT 
Y vov v vy Y 
0 7|8 17 | 18 24 | 25 31 
Figure 2-10. RI16 Instruction Format 
OP 116 RT 
Y v vy Y 
0 8/9 24 | 25 31 
Figure 2-11. RI18 Instruction Format 
OP n8 RT 
Y vov vv v 
0 6|7 24 |25 31 
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3. Memory—Load/Store Instructions 


This section lists and describes the SPU load/store instructions. 


The SPU architecture defines a private memory, also called local storage, which is byte-addressed. Load and 
store instructions combine operands from one or two registers and an immediate value to form the effective 
address of the memory operand. Only aligned 16-byte-long quadwords can be loaded and stored. Therefore, 
the rightmost 4 bits of an effective address are always ignored and are assumed to be zero. 


The size of the SPU local storage address space is 29? bytes. However, an implementation generally has a 
smaller actual memory size. The effective size of the memory is specified by the Local Storage Limit Register 
(LSLR). Implementations can provide methods for accessing the LSLR; however, these methods are outside 
the scope of the SPU Instruction Set Architecture. Implementations can allow modifications to the LSLR 
value; however, the LSLR must not change while the SPU is running. Every effective address is ANDed with 
the LSLR before it is used to reference memory. The LSLR can be used to make the memory appear to be 
smaller than it is, thus providing compatibility for programs compiled for a smaller memory size. The LSLR 
value is a mask that controls the effective memory size. This value must have the following properties: 


* Limit the effective memory size to be less than or equal to the actual memory size 


* Be monotonic, so that the least-significant 4 mask bits are ones and so that there is at most a single tran- 
sition from ‘1’ to ‘0’ and no transitions from ‘0’ to ‘1’ as the bits are read from the least-significant to the 
most-significant bit. That is, the value must be 2"-1, where n is logs (effective memory size). 


The effect of this is that references to memory beyond the last byte of the effective size are wrapped—that is, 
interpreted modulo the effective size. This definition allows an address to be used for a load before it has 
been checked for validity, and makes it possible to overlap memory latency with other operations more easily. 


Stores of less than a quadword are performed by a load-modify-store sequence. A group of assist instructions 
is provided for this type of sequence. The assist instruction names are prefixed with Generate Control. 
These instructions are described in this section. For example, see Generate Controls for Byte Insertion (d- 
form) on page 40. 


In a typical system configuration, the SPU local storage is externally accessible. The possibility therefore 
exists of SPU memory being modified asynchronously during the course of execution of an SPU program. All 
references (loads, stores) to local storage by an SPU program, and aligned external references to SPU 
memory, are atomic. Unaligned references are not atomic, and portions of such operations can be observed 
by a program executing in the SPU. Table 3-1 shows sample LSLRs and the local storage address space 
size they correspond to. 


Table 3-1. Example LSLR Values and Corresponding Local Storage Sizes 
































LSLR Local Storage Size 
0x0003 FFFF 256 KB 
0x0001 FFFF 128 KB 
0x0000 FFFF 64 KB 
0x0000 7FFF 32 KB 
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Load Quadword (d-form) Required v 1.0 
Iqd rt,symbol(ra) 


110 RA RT 


0 
pos Vx yy V 
7/8 9 10 11 12 13 14 15 16 17.18 19 20 21 22 23 24/25 26 27 28 29 30 31 











O| o 


i 


B«— o 
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2|«— o 


l 




















The local storage address is computed by adding the signed value in the 110 field, with 4 zero bits appended, 
to the value in the preferred slot of register RA and forcing the rightmost 4 bits of the sum to zero. The 16 
bytes at the local storage address are placed into register RT. This instruction is computed using the following 
formula: 





LSA < (RepLeftBit(110 Il 0b0000,32) + RA93) & LSLR & OxFFFFFFFO 








RT < LocStor(LSA, 16) 
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Iqx rt,ra,rb 
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RB 
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Required v 1.0 


RA RT 





Y vy 


vv Y 
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oje o 
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>j -= 
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oj o 
[se o 
o| = 
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11 12 13 14 15 16 17/18 19 20 21 22 23 24,25 26 27 28 29 30 31 














The local storage address is computed by adding the value in the preferred slot of register RA to the value in 
the preferred slot of register RB and forcing the rightmost 4 bits of the sum to zero. The 16 bytes at the local 
storage address are placed into register RT. This instruction is computed using the following formula: 





LSA < (RAO + RB93) & LSLR & OxFFFFFFFO 








RT < LocStor(LSA,16) 
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Load Quadword (a-form) Required v 1.0 
Iqa rt,symbol 

0.0 1 1.0000 1 116 RT 

toy *oy Yo * * Y y Y V i 
0 1 2 3 4 5 6 7 8,9 10 11 12 18 14 15 16 17 18 19 20 21 22 23 24/25 26 27 28 29 30 31 

















The value in the 116 field, with 2 zero bits appended and extended on the left with copies of the most-signifi- 
cant bit, is used as the local storage address. The 16 bytes at the local storage address are loaded into 
register RT. 





LSA < RepLeftBit(I16 l| 0500,32) & LSLR & OxFFFFFFFO 











RT < LocStor(LSA,16) 
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Load Quadword Instruction Relative (a-form) Required v 1.0 
Iqr rt,symbol 

0.0 1 1 0 O 1 1| 1 116 RT 

tye bev ddd Vd i 
0 1 2 3 4 5 6 7 8,9 10 11 12 18 14 15 16 17 18 19 20 21 22 23 24/25 26 27 28 29 30 31 

















The value in the 116 field, with 2 zero bits appended, is added to the program counter (PC) to form the local 
storage address. The 16 bytes at the local storage address are loaded into register RT. 





LSA < (RepLeftBit(116 Il 0500,32) + PC) & LSLR & OxFFFFFFFO 
RT < LocStor(LSA,16) 
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Store Quadword (d-form) 
stqd rt,symbol(ra) 
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8 9 10 11 12 13 14 15 16 17/18 19 20 21 22 23 24 








25 26 27 28 29 30 31 








The local storage address is computed by adding the signed value in the 110 field, with 4 zero bits appended, 
to the value in the preferred slot of register RA and forcing the rightmost 4 bits of the sum to zero. The 
contents of register RT are stored at the local storage address. 





LSA < (RepLeftBit(110 Il 0b0000,32) + RA?) & LSLR & OxFFFFFFFO 





LocStor(LSA,16) < RT 
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Store Quadword (x-form) Required v 1.0 
stqx rt,ra,rb 
RB RA RT 


0 0 
tiv VV + 4 4 
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The local storage address is computed by adding the value in the preferred slot of register RA to the value in 
the preferred slot of register RB and forcing the rightmost 4 bits of the sum to zero. The contents of 
register RT are stored at the local storage address. 





LSA < (RA93 + RB®) & LSLR & OxFFFFFFFO 








LocStor(LSA,16) < RT 
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Store Quadword (a-form) Required v 1.0 
stqa rt. symbol 

00100000 1 116 RT 

toy oy vov o Y v Y V4 i 
0 1 2 3 4 5 6 7 8,9 10 11 12 18 14 15 16 17 18 19 20 21 22 23 24/25 26 27 28 29 30 31 

















The value in the 116 field, with 2 zero bits appended and extended on the left with copies of the most-signifi- 
cant bit, is used as the local storage address. The contents of register RT are stored at the location given by 
the local storage address. 

















LSA < RepLeftBit(I16 I] 0500,32) & LSLR & OxFFFFFFFO 
LocStor(LSA,16) < RT 
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Store Quadword Instruction Relative (a-form) Required v 1.0 
stqr rt. symbol 

0 O 1 0 O O 1 1 1 116 RT 

tye teeters Vi i 
0 1 2 3 4 5 6 7 8,9 10 11 12 18 14 15 16 17 18 19 20 21 22 23 24/25 26 27 28 29 30 31 

















The value in the 116 field, with two zero bits appended and extended on the left with copies of the most-signif- 
icant bit, is added to the program counter (PC) to form the local storage address. The contents of register RT 
are stored at the location given by the local storage address. 





LSA < (RepLeftBit(I16 Il 0b00,32) + PC) & LSLR & OxFFFFFFFO 
LocStor(LSA,16) < RT 
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Generate Controls for Byte Insertion (d-form) Required v 1.0 








cbd rt,symbol(ra) 

0 0 1) 1 1 1 1.0 1 0 O 17 RA RT 

oy $oYoy Y oy * y y Y Vy Vs * 
0.12 3 4 5 6 7 8 9 10/11 12 13 14 15 16 17|18 19 20 21 22 23 24/25 26 27 28 29 30 31 




















A 4-bit address is computed by adding the value in the signed I7 field to the value in the preferred slot of 
register RA. The address is used to determine the position of the addressed byte within a quadword. Based 
on the position, a mask is generated that can be used with the Shuffle Bytes (shufb) instruction to insert a 
byte at the indicated position within a (previously loaded) quadword. The byte is taken from the rightmost byte 
position of the preferred slot of the RA operand of the shufb instruction. See Appendix B Details of the 
Generate Controls Instructions on page 265 for the details of the created mask. 

















t < (RA93 + RepLeftBit(I7,32)) & 0Ox0000000F 
RT < 0x101112131415161718191A1B1C1D1E1F 
RT! < 0x03 
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Generate Controls for Byte Insertion (x-form) Required v 1.0 
cbx rt,ra,rb 

0.0 1 1 1 0 1 0 1 0 O RB RA RT 

oe v Vy Vx 4 
O 1 2 3 4 5 6 7 8 9 10|11 12 13 14 15 16 17/18 19 20 21 22 23 24|25 26 27 28 29 30 31 




















A 4-bit address is computed by adding the value in the preferred slot of register RA to the value in the 
preferred slot of register RB. The address is used to determine the position of the addressed byte within a 
quadword. Based on the position, a mask is generated that can be used with the shufb instruction to insert a 
byte at the indicated position within a (previously loaded) quadword. The byte is taken from the rightmost byte 
position of the preferred slot of the RA operand of the shufb instruction. See Appendix B Details of the 
Generate Controls Instructions on page 265 for the details of the created mask. 

















t < (RA93 + RB®:3) & 0x0000000F 
RT < 0x101112131415161718191A1B1C1D1E1F 
RT! < 0x03 
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Generate Controls for Halfword Insertion (d-form) Required v 1.0 








chd rt,symbol(ra) 

0 0 ) 1 1 1 1 0 1 O 1 I7 RA RT 

Voy $ ete y Y Y V y Y ty vy y 
0.12 3 4 5 6 7 8 9 10/11 12 13 14 15 16 17/18 19 20 21 22 23 24/25 26 27 28 29 30 31 




















A 4-bit address is computed by adding the value in the signed I7 field to the value in the preferred slot of 
register RA and forcing the least-significant bit to zero. The address is used to determine the position of an 
aligned halfword within a quadword. Based on the position, a mask is generated that can be used with the 
shufb instruction to insert a halfword at the indicated position within a quadword. The halfword is taken from 
the rightmost 2 bytes of the preferred slot of the RA operand of the shufb instruction. See Appendix B Details 
of the Generate Controls Instructions on page 265 for the details of the created mask. 

















t < (RA93 + RepLeftBit(I7,32)) & OxX0000000E 
RT < 0x101112131415161718191A1B1C1D1E1F 
RT*? < 0x0203 
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Generate Controls for Halfword Insertion (x-form) Required v 1.0 








chx rt,ra,rb 

0.0 1 1 1 0 1 ü O 14 0 1 RB RA RT 
Vey * o» y * Y vov ey Vu yy i 
012 3 4 5 6 7 8 9 10/11 12 13 14 15 16 17/18 19 20 21 22 23 24|25 26 27 28 29 30 31 




















A 4-bit address is computed by adding the value in the preferred slot of register RA to the value in the 
preferred slot of register RB and forcing the least-significant bit to zero. The address is used to determine the 
position of an aligned halfword within a quadword. Based on the position, a mask is generated that can be 
used with the shufb instruction to insert a halfword at the indicated position within a quadword. The halfword 
is taken from the rightmost 2 bytes of the preferred slot of the RA operand of the shufb instruction. See 
Appendix B Details of the Generate Controls Instructions on page 265 for the details of the created mask. 

















t < (RA93 + RB?3) & 0x0000000E 
RT < 0x101112131415161718191A1B1C1D1E1F 
RT"? < 0x0203 
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Generate Controls for Word Insertion (d-form) Required v 1.0 


cwd rt,symbol(ra) 
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A 4-bit address is computed by adding the value in the signed I7 field to the value in the preferred slot of 
register RA and forcing the least-significant 2 bits to zero. The address is used to determine the position of an 
aligned word within a quadword. Based on the position, a mask is generated that can be used with the shufb 
instruction to insert a word at the indicated position within a quadword. The word is taken from the preferred 
slot of the RA operand of the shufb instruction. See Appendix B Details of the Generate Controls Instructions 
on page 265 for the details of the created mask. 








t < (RA9? + RepLeftBit(I7,32)) & 0x0000000C 
RT < 0x101112131415161718191A1B1C1D1E1F 
RTE4 < 0x00010203 
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Generate Controls for Word Insertion (x-form) Required v 1.0 
CWX rt,ra,rb 

0 0 1 1 1 0 1 O 1 1 +0 RB RA RT 

oe èżżżłżłżvył Vy vy v 
0 1 2 3 4 5 6 7 8 9 10|11 12 13 14 15 16 17/18 19 20 21 22 23 24|25 26 27 28 29 30 31 




















A 4-bit address is computed by adding the value in the preferred slot of register RA to the value in the 
preferred slot of register RB and forcing the least-significant 2 bits to zero. The address is used to determine 
the position of an aligned word within a quadword. Based on the position, a mask is generated that can be 
used with the shufb instruction to insert a word at the indicated position within a quadword. The word is taken 
from the preferred slot of the RA operand of the shufb instruction. See Appendix B Details of the Generate 
Controls Instructions on page 265 for the details of the created mask. 














t < (RA93 + RB®%) & 0x0000000C 
RT < 0x101112131415161718191A1B1C1D1E1F 
RTE4 < 0x00010203 
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Generate Controls for Doubleword Insertion (d-form) Required v 1.0 








cdd rt,symbol(ra) 

0 0 4) 1 1 1 1 O 1 14 1 I7 RA RT 

Voy $ Y Vy y Y Y V y Y ty vy y 
0.12 3 4 5 6 7 8 9 10/11 12 13 14 15 16 17/18 19 20 21 22 23 24/25 26 27 28 29 30 31 




















A 4-bit address is computed by adding the value in the signed I7 field to the value in the preferred slot of 
register RA and forcing the least-significant 3 bits to zero. The address is used to determine the position of an 
aligned doubleword within a quadword. Based on the position, a mask is generated that can be used with the 
shufb instruction to insert a doubleword at the indicated position within a quadword. The doubleword is taken 
from the leftmost 8 bytes of the RA operand of the shufb instruction. See Appendix B Details of the Generate 
Controls Instructions on page 265 for the details of the created mask. 

















t «- (RA®3 + RepLeftBit(17,32)) & 0x00000008 
RT < 0x101112131415161718191A1B1C1D1E1F 
RTt::8 < 0x0001020304050607 
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Generate Controls for Doubleword Insertion (x-form) Required v 1.0 
cdx rt,ra,rb 
RB RA RT 
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A 4-bit address is computed by adding the value in the preferred slot of register RA to the value in the 
preferred slot of register RB and forcing the least-significant 3 bits to zero. The address is used to determine 
the position of the addressed doubleword within a quadword. Based on the position, a mask is generated that 
can be used with the shufb instruction to insert a doubleword at the indicated position within a quadword. The 
quadword is taken from the leftmost 8 bytes of the RA operand of the shufb instruction. See 

Appendix B Details of the Generate Controls Instructions on page 265 for the details of the created mask. 














t < (RA93 + RB93) & 0x00000008 
RT < 0x101112131415161718191A1B1C1D1E1F 
RTË:8 <— 0x0001020304050607 
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4. Constant-Formation Instructions 


This section lists and describes the SPU constant-formation instructions. 
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Required v 1.0 








ilh rt,symbol 

01000001 1 116 RT 
$3114 vy i 
0. 12 3 4 5 6 7 8/9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24|25 26 27 28 29 30 31 

















For each of eight halfword slots: 


* The value in the 116 field is placed in register RT. 


Programming Note: There is no Immediate Load Byte instruction. However, that function can be performed 
by the ilh instruction with a suitable value in the 116 field. 























S < 116 
RTO <s 
RT23 <s 
RT^9 <s 
RT? <s 
Ar ces 

RT! 0:11 <s 
RT! 2:13 es 
RT14:15 PEN 
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Immediate Load Halfword Upper Required v 1.0 
ilhu rt,symbol 
116 RT 
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For each of four word slots: 
* The value in the 116 field is placed in the leftmost 16 bits of the word. 
* The remaining bits of the word are set to zero. 
Programming Note: This instruction, when used in conjunction with Immediate Or Halfword Lower (iohl), 


can be used to form an arbitrary 32-bit value in each word slot of a register. It can also be used alone to load 
an immediate floating-point constant with up to 7 bits of significance in its fraction. 














t < 116 Il 0x0000 
pres et 
RT^7 et 
pre et 
RT! 2:15 et 
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Immediate Load Word Required v 1.0 
il rt,symbol 

0 1000 0 0 0 1 116 RT 
$41 45 V i 
0 1 2 3 4 5 6 7 8,9 10 11 12 18 14 15 16 17 18 19 20 21 22 23 24/25 26 27 28 29 30 31 

















For each of four word slots: 
* The value in the 116 field is expanded to 32 bits by replicating the leftmost bit. 
* The resulting value is placed in register RT. 























t < RepLeftBit(116,32) 
Are <t 
RT^7 et 
RTe 1 et 
Are et 
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Immediate Load Address Required v 1.0 
ila rt,symbol 

O 1 0 0 0 0 41 118 RT 

Voy Ay op cy vy Y 
0 1 2 3 4 5 67 8 9 10 11 12 18 14 15 16 17 18 19 20 21 22 23 24/25 26 27 28 29 30 31 

















For each of four word slots: 
* The value in the 118 field is placed unchanged in the rightmost 18 bits of register RT. 


* The remaining bits of register RT are set to zero. 


Programming Note: Immediate Load Address can be used to load an immediate value, such as an address 
or a small constant, without sign extension. 














t — 140 118 
RTOS et 
RT^7 ct 
gre et 
RT!2:15 et 
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Immediate Or Halfword Lower Required v 1.0 
iohl rt,symbol 

O 110000 0 1 116 RT 

tye tbe tide Vi i 
012 3 4 5 6 7 8,9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 

















For each of four word slots: 
* The value in the 116 field is prefaced with zeros and ORed with the value in register RT. 


* The result is placed into register RT. 


Programming Note: Immediate Or Halfword Lower can be used in conjunction with Immediate Load Half- 
word Upper to load a 32-bit immediate value. 




















t < 0x0000 II 116 
AT? € RT°S/t 
RT*7 < RT^" |t 
pre e RTE |t 
RT12:15 c RTZ jt 
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Form Select Mask for Bytes Immediate Required v 1.0 
fsmbi rt,symbol 

0.0 1 1 0 O 1 0 1 116 RT 
$1114 Vy i 
012 3 4 5 6 7 8 











9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24/25 26 27 28 29 30 31 











The 116 field is used to create a mask in register RT by making eight copies of each bit. Bits in the operand 
are related to bytes in the result in a left-to-right correspondence. 


Programming Note: This instruction can be used to create a mask for use with the Select Bits instruction. It 
can also be used to create masks for halfwords, words, and doublewords. 











s <| l16 
forj=0to 15 
If sj= 0 then A < 0x00 
else r! «— OxFF 
end 
RT<r 
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5. Integer and Logical Instructions 


This section lists and describes the SPU integer and logical instructions. 
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Add Halfword Required v 1.0 
ah rt,ra,rb 

0 0 0 RB RA RT 

j wy ys v3 y 
8 9 10/11 12 13 14 15 16 17/18 19 20 21 22 23 24/25 26 27 28 29 30 31 





N\|<— = 


Vy 


we -= 
Al<— = 


l 


=j o 


l 























For each of eight halfword slots: 


* The operand from register RA is added to the operand from register RB. 
* The 16-bit result is placed in RT. 


¢ Overflows and carries are not detected. 





























RT?! T RAO 4 Rp? 
RT23 x RA?3 a RB?3 
RT45 < RA^? + RB*S 
RT87 c RA87 + RB®&7 
RT&:9 a RAS9 4 RB&:9 
RT10:11 + RA1011 4 RB10:11 
RT12:13 & RA12:13 + RB12:13 
RT14:15 & RA1415 + Rp!^15 








Integer and Logical Instructions 


Version 1.2 
Page 58 of 278 


January 27, 2007 


SONY 


SONY 


SAMMI © 


Add Halfword Immediate 


ahi rt,ra,value 


110 RA 





Instruction Set Architecture 


Synergistic Processor Unit 


Required v 1.0 




















0 0 O 1 1 1 0 1 RT 
oe y s Y Yo vy J 
012 3 4 5 6 7/8 9 10 11 12 13 14 15 16 17/18 19 20 21 22 23 24/25 26 27 28 29 30 31 











For each of eight halfword slots: 


* The signed value in the 110 field is added to the value in register RA. 
* The 16-bit result is placed in RT. 


e Overflows and carries are not detected. 





























S < RepLeftBit(110,16) 

RT?! < RA? +s 

RTS — RA?3 +s 

Rr <RA* +s 

RT$7 < RAS" +s 

RT8:9 < RAPP +s 
RT10:11 <— RAO +S 
RT12:13 e RA12:134 s 
RT!415 —RAM15 4g 
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Add Word Required v 1.0 
a rt,ra,rb 

0 00 1 1 00 0 0 0 0 RB RA RT 

oe v vy yy 4 
0 1 2 3 4 5 6 7 8 9 18 19 20 21 22 23 24/25 26 27 28 29 30 31 





10/11 12 13 14 15 16 17 














For each of four word slots: 


* The operand from register RA is added to the operand from register RB. 


e The 32-bit result is placed in register RT. 
* Overflows and carries are not detected. 




















Rr < RA?? + RB™3 
RT^7 < RA* + RB*7 
RT811 e RA®11 a RB&11 
RT!2:15 c RA12:15 + RB12:15 
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Add Word Immediate Required v 1.0 











ai rt,ra,value 

0.00 1 11 0 0 110 RA RT 

ttt tti seis vy vy . 
0. 12 3 4 5 6 7,8 9 10 11 12 13 14 15 16 17/18 19 20 21 22 23 24|25 26 27 28 29 30 31 




















For each of four word slots: 
* The signed value in the 110 field is added to the operand in register RA. 


e The 32-bit result is placed in register RT. 


¢ Overflows and carries are not detected. 




















t < RepLeftBit(10,32) 
RTOS < RA®3 +t 
Aa? < RA*? +t 
RS < RAS! «t 
RT!2:15 + RA12:15 4 + 
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Subtract from Halfword Required v 1.0 
sfh rt,ra,rb 

RB RA RT 


0.0 0 
RE" V vw 4 
8 9 10/11 12 13 14 15 16 17/18 19 20 21 22 23 24/25 26 27 28 29 30 31 








N|<— = 


Vy 


Ale = 


4 


2|«— o 


l 




















For each of eight halfword slots: 
* The value in register RA is subtracted from the value in RB. 
* The 16-bit result is placed in register RT. 


¢ Overflows and carries are not detected. 





























RT « RB! + (GRA1) + 1 
RT23 < RB?? + (CRA?9) + 1 
RT45 < RB*5 + (CRA*9) + 1 
RTS” < RB®? + (CRAS?) + 1 
Rie? < RB9? + (CRA99) + 1 
RT10:11 + RB10:11 + (~RA1®:11) 4 4 
RT12:13 c RB12:13 + (=~RA!12:13) +1 
RT14:15 a RB14:15 + (=RA1415) +1 








Integer and Logical Instructions Version 1.2 
Page 62 of 278 January 27, 2007 


SONY 


SONY €» Instruction Set Architecture 


Synergistic Processor Unit 
Subtract from Halfword Immediate Required v 1.0 
sfhi rt,ra,value 
O 1 110 RA RT 
tye Y x + ¥ y 
6 7/8 9 10 11 12 13 14 15 16 17|18 19 20 21 22 23 24|25 26 27 28 29 30 31 











N| O 
W\<— o 
Ale = 
aj -= 
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Oj o 




















For each of eight halfword slots: 
* The value in register RA is subtracted from the signed value in the 110 field. 
e The 16-bit result is placed in register RT. 
* Overflows are not detected. 


Programming Note: Although there is no Subtract Halfword Immediate instruction, its effect can be achieved 
by using the Add Halfword Immediate with a negative immediate field. 





























t < RepLeftBit(110,16) 
RT: € t (GRAO!) +1 
RT?3 c t+ (GRA23) + 1 
RT45 € t+ (GRA*5) + 1 
RT®&? «t+ (GRAS7) +1 
RTS9? < t+ (-RAS9) + 1 

Aron < t+ (GRA) + 1 

a «€ t (-RAT?9) + 1 

prr ets «€ t€ (SRAT1415) + 1 








Version 1.2 


Integer and Logical Instructions 
January 27, 2007 


Page 63 of 278 


SONY 


Instruction Set Architecture €» SONY 


COMPUTER 
SER ARMAS a 


Synergistic Processor Unit 











Subtract from Word Required v 1.0 
sf rt,ra,rb 

0000100000 0 RB RA RT 

Voy $* Y y Y Y V y Y Vy yy 4 
O 1 2 3 4 5 6 7 8 9 10|11 12 13 14 15 16 17/18 19 20 21 22 23 24|25 26 27 28 29 30 31 

















For each of four word slots: 
* The value in register RA is subtracted from the value in register RB. 
* The result is placed in register RT. 


¢ Overflows and carries are not detected. 

















RTs «— RBS + (CRA93) + 1 
Rie? < RB^" + (CRA*) + 1 
Rr! « RBS'! + (CRA9:11) + 1 
RT12:15 < RB'215 4 (RAT2:15) +1 
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Subtract from Word Immediate Required v 1.0 


sfi rt,ra,value 


110 RA RT 


0 0 
v y 4 x Y 
6 7.8 9 10 11 12 13 14 15 16 17/18 19 20 21 22 23 24 25 26 27 28 29 30 31 











N€ Oo 
CO |«— o 
Ale = 
aj -= 


i 


Oj o 




















For each of four word slots: 
e The value in register RA is subtracted from the value in the 110 field. 
* The result is placed in register RT. 
* Overflows and carries are not detected. 


Programming Note: Although there is no Subtract Immediate instruction, its effect can be achieved by using 
the Add Immediate with a negative immediate field. 

















t < RepLeftBit(110,32) 
RIS «€ t€ (CRA93). +1 
RT*7 t+ (GRAT7). +1 
RT! c t (CRA?1) 4 1 
pee <t+ (GRA) 4 1 
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RA RT 
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Vx vov Y 
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11 12 13 14 15 16 17/18 19 20 21 22 23 24/25 26 27 28 29 30 31 














For each of four word slots: 


* The operand from register RA is added to the operand from register RB and the least-significant bit of the 


operand from register RT. 


* The 32-bit result is placed in register RT. Bits O to 30 of the RT input are reserved and should be zero. 














RTS « RAS + RBO + RT 34 
RT^7 < RA* + RB? + RTeg 
gre < RAS! + RBE'! + RTg5 
RT12:15 P RA12:15 + RB12:15 + RT 407 
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Carry Generate Required v 1.0 
cg rt,ra,rb 

0 00 1 1 00 0 0 1 0 RB RA RT 

o Y voy Y Y Y v Y Y vy vy v 
O 1 2 3 4 5 6 7 8 9 10|11 12 13 14 15 16 17 18 19 20 21 22 23 24|25 26 27 28 29 30 31 




















For each of four word slots: 
* The operand from register RA is added to the operand from register RB. 
* The carry-out is placed in the least-significant bit of register RT. 


* The remaining bits of RT are set to zero. 





for j = 0 to 15 by 4 l 
to:32 = ((0 II RAM) + (0 II RB/:4) 


RT/4 <= 310 I to 
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Carry Generate Extended Required v 1.0 








cgx rt,ra,rb 

0 1 1.010000 1 0 RB RA RT 

VOX Yo» wow ce xo V ex ly V3 y 
0 1 2 3 4 5 6 7 8 9 10/11 12 13 14 15 16 17/18 19 20 21 22 23 24|25 26 27 28 29 30 31 




















For each of four word slots: 


* The operand from register RA is added to the operand from register RB and the least-significant bit of 
register RT. 


* The carry-out is placed in the least-significant bit of register RT. 


* The remaining bits of RT are set to zero. Bits 0 to 30 of the RT input are reserved and should be zero. 





for j = 0 to 15 by 4 
to:32 = (0 II RAM) + (0 II RB) + (320 II RT) * 8 +31) 
RTi4 {= 310 I to 


end 
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RA RT 





vy vov Y 
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co |«— o 
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nje = 
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10/11 12 13 14 15 16 17/18 19 20 21 22 23 24/25 26 27 28 29 30 31 














For each of four word slots: 


* The operand from register RA is subtracted from the operand from register RB. An additional ‘1’ is sub- 
tracted from the result if the least-significant bit of RT is ‘0’. 


* The 32-bit result is placed in register RT. Bits O to 30 of the RT input are reserved and should be zero. 














RT < RB9? + (CRA93) + RT34 
Br? < RB^" + (=RA*”) + RT gg 
a < RB91 + (GRA®") + RT 95 
RT12:15 + RBI2:15 4 (aRA12:15) + RT 407 
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Borrow Generate Required v 1.0 








bg rt,ra,rb 

00001 0000 1 0 RB RA RT 
poo wew dece ee yy Vs i 
0 12 3 4 5 6 7 8 9 10/11 12 13 14 15 16 17|18 19 20 21 22 23 24|25 26 27 28 29 30 31 




















For each of four word slots: 


e Ifthe unsigned value of RA is greater than the unsigned value of RB, then ‘0’ is placed in register RT. Oth- 
erwise, ‘1’ is placed in register RT. 





for j = 0 to 15 by 4 
it (RB-^ 2" RAM) then RT4 c 1 
else RT/^ <0 
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Borrow Generate Extended Required v 1.0 


bgx rt,ra,rb 


RB RA 


RT 








o| o 


Yo 


nje = 
w| o 
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1 1 
1 iv y v v 
9 10/11 12 13 14 15 16 17/18 19 20 21 22 23 24,25 26 27 28 29 30 31 











For each of four word slots: 


* The operand from register RA is subtracted from the operand from register RB. An additional ‘1’ is sub- 
tracted from the result if the least-significant bit of RT is ‘0’. If the result is less than zero, a '0' is placed in 
register RT. Otherwise, register RT is set to ‘1’. Bits O to 30 of the RT input are reserved and should be 


Zero. 





for j = 0 to 15 by 4 
if (RT) g , 31) then 
if (RBI4 >u RAI*4) then RT/A4 & 1 





else RT/4— 0 
else 
if (RB;4 >" RA^) then RT*4 c 1 
else RT/4 0 
end 
Version 1.2 


January 27, 2007 





Integer and Logical Instructions 


Page 71 of 278 


SONY 


Instruction Set Architecture €» SONY 


COMPUTER 
SER ARMAS a 


Synergistic Processor Unit 
Multiply Required v 1.0 
mpy rt,ra,rb 


RB RA RT 


M vy vov Y 
11 12 13 14 15 16 17 18 19 20 21 22 23 24/25 26 27 28 29 30 31 
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For each of four word slots: 


* The value in the rightmost 16 bits of register RA is multiplied by the value in the rightmost 16 bits of regis- 
ter RB. 


* The 32-bit product is placed in register RT. 


* The leftmost 16 bits of each operand are ignored. 

















RTO3 P RA23 * RB23 
RT^7 c RA87 * RB87 
RT?! © RA1011 + Rpg10:11 
RT12:15 c RA415 + RB14:15 
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Multiply Unsigned Required v 1.0 














mpyu rt,ra,rb 

O 1 1 1 1 0 0 1! 1 0 0 RB RA RT 

oy yey ye Ma ey Vs Vs j 
0.12 3 4 5 6 7 8 9 10/11 12 13 14 15 16 17|18 19 20 21 22 23 24|25 26 27 28 29 30 31 














For each of four word slots: 
* The rightmost 16 bits of register RA are multiplied by the rightmost 16 bits of register RB, treating both 


operands as unsigned. 
* The 32-bit product is placed in register RT. 














RI < RA?? |*| RB23 
RT47 < RA" || RBê7 
RT8:11 < RA10:11 |*] RB10:11 
RT12:15 © RA415 [*| RB1415 
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Required v 1.0 


RA 


RT 
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ne = 


vy 








9 


10 11 12 13 14 15 16 17 





18 19 


20 21 22 23 24,25 26 27 28 29 30 31 











For each of four word slots: 


* The signed value in the 110 field is multiplied by the value in the rightmost 16 bits of register RA. 


* The resulting product is placed in register RT. 

















t < RepLeftBit(I10,16) 
Are? < RA?3 * t 
Fires « RA *t 
RT8:11 «— RA10:11 *t 
RqT12:15 e RA14:15 * 4 
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Multiply Unsigned Immediate Required v 1.0 
mpyui rt,ra,value 
110 RA RT 
Y vy vy Y 








Nie -= 


l 


Oc1«— = 


l 


NIP A 
Qi«— = 


Y 


Oj o 


8 9 10 11 12 13 14 15 16 17,18 19 20 21 22 23 24,25 26 27 28 29 30 31 

















For each of four word slots: 


* The signed value in the 110 field is extended to 16 bits by replicating the leftmost bit. The resulting value is 
multiplied by the rightmost 16 bits of register RA, treating both operands as unsigned. 


* The resulting product is placed in register RT. 

















t < RepLeftBit(110,16) 
ROS « RA? [| t 
RT47 «- RA®7 [| t 
RTS < RAO p 
RT!2:15 + RA!415 [5 t 
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Multiply and Add Required v 1.0 
mpya rt,ra,rb,rc 

1100 RT RB RA RC 

Lv yt vx v »x q 
0 1 2 3/4 5 6 7 8 9 10/11 12 13 14 15 16 17|18 19 20 21 22 23 24/25 26 27 28 29 30 31 
































For each of four word slots: 


* The value in register RA is treated as a 16-bit signed integer and multiplied by the 16-bit signed value in 
register RB. The resulting product is added to the value in register RC. 


* The result is placed in register RT. 
* Overflows and carries are not detected. 


Programming Note: The operands are right-aligned within the 32-bit field. 
































to < RA?? * gg?? 
t1 < RA" * RB®7 
t2 © RA10:11 * ppg10:11 
13 c RAI415 * ng!4:15 

RTOS «10 + RC®S 

Rr c t1 + RC*7 

gre c 12 + RCS 

RT12:15 <13+RC'215 

Integer and Logical Instructions Version 1.2 


Page 76 of 278 January 27, 2007 


SONY 


SONY 


Multiply High 


mpyh 


rt,ra,rb 


RB 
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Required v 1.0 


RA RT 





; 
Lv V4 v v 





nje = 
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Oj|«— o 
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c « A 


10/11 12 13 14 15 16 17 








18 19 20 21 22 23 24,25 26 27 28 29 30 31 











For each of four word slots: 


* The leftmost 16 bits of the value in register RA are shifted right by 16 bits and multiplied by the 16-bit 


value in register 


RB. 


* The product is shifted left by 16 bits and placed in register RT. Bits shifted out at the left are discarded. 
Zeros are shifted in at the right. 


























t0 PIE RA01 * RB2:3 
t1 < RA*® * RBô7 
" c RAS:9 * RB10:11 
3 c RA12Z13 « RB14:15 
RTO:3 < 10? || 0x0000 
att? < t173 || 0x0000 
RTE: < 1223 || 0x0000 
AT12:15 < 132 || 0x0000 








Programming Note: This instruction can be used in conjunction with mpyu and Add Word (a) to perform a 
32-bit multiply. A 32-bit multiply instruction, mpy32 rt,ra,rb, can be emulated with the following instruction 


sequence: 
mpyh 
mpyh 
mpyu 
a 
a 
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Multiply and Shift Right Required v 1.0 
mpys rt,ra,rb 

O 1 1 1 1 0 0 O 1 1 1 RB RA RT 
liis Vs Vx 4 
O 1 2 3 4 5 6 7 8 9 10|11 12 13 14 15 16 17/18 19 20 21 22 23 24|25 26 27 28 29 30 31 




















For each of four word slots: 


* The value in the rightmost 16 bits of register RA is multiplied by the value in the rightmost 16 bits of regis- 
ter RB. 


* The leftmost 16 bits of the 32-bit product are placed in the rightmost 16 bits of register RT, with the sign 
bit replicated into the left 16 bits of the register. 
































(0 < RA*S * gg? 
t «— RAS" * RBê7 
" e RAIO-11 * RpB10:14 
3 + RA415 * RB14:15 
RTOS < RepLeftBit(t00"! 32) 
RT < RepLeftBit(t1°"",32) 
RT8:11 < RepLeftBit(t2™1,32) 
RT12:15 < RepLeftBit(t3°",32) 
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Multiply High High Required v 1.0 
mpyhh rt,ra,rb 
RB RA RT 


1 0 
byy V4 v v 
9 10/11 12 13 14 15 16 17/18 19 20 21 22 23 24,25 26 27 28 29 30 31 
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For each of four word slots: 
* The leftmost 16 bits in register RA are multiplied by the leftmost 16 bits in register RB. 


* The 32-bit product is placed in register RT. 














Ares < RAO * gg?! 
EIS < RA^? * RB*S 
RT8:11 <— RA8? * RB8:9 
RT12:15 < RA12:13 * RB12:13 
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Multiply High High and Add Required v 1.0 
mpyhha rt,ra,rb 


RB RA RT 


M vy vov Y 
11 12 13 14 15 16 17 18 19 20 21 22 23 24/25 26 27 28 29 30 31 
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For each of four word slots: 


e The leftmost 16 bits in register RA are multiplied by the leftmost 16 bits in register RB. The product is 
added to the value in register RT. 


* The sum is placed in register RT. 











Rr? < RA?'! * gg?! + RTOS 
RT^7 — RA*? * RB^? + RT^7 
RT8:11 < RA8? * np89, gT8:11 
RT12:15 + RA12:13 * RB12:13 , RT12:15 
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mpyhhu rt,ra,rb 

O 1 1 1 1 0 O 1! 1 1 0 RB RA RT 

Voy 3 3c eco Woo cy Y Vs Vs j 
0.12 34 56 7 8 9 25 26 27 28 29 30 31 
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For each of four word slots: 


* The leftmost 16 bits in register RA are multiplied by the leftmost 16 bits in register RB, treating both oper- 


ands as unsigned. 


* The 32-bit product is placed in register RT. 





RTO3 


a RA? I*l RB? 





RT^7 


< RA*® || RB*S 





RT811 
RT! 2:15 





< RAS? |*| RBS? 


a RA12:13 Il RB12:13 
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For each of four word slots: 


* The leftmost 16 bits in register RA are multiplied by the leftmost 16 bits in register RB, treating both oper- 
ands as unsigned. The product is added to the value in register RT. 


* The sum is placed in register RT. 














Rr? < RA?'! [*| RBO1 + RTOS 
RT*” < RA*® [| RB^9 + RT*7 
gre < RAS? || RB89, RTS" 
RT12:15 © RA1213 [*| RB12:13 4 RT1215 
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Count Leading Zeros Required v 1.0 
clz rt,ra 

0 1 0 1 O 1 0 O 1 0 1 Ill RA RT 
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For each of four word slots: 
* The number of zero bits to the left of the first ‘1’ bit in the operand in register RA is computed. 
* The result is placed in register RT. If register RA is zero, the result is 32. 
Programming Note: The result placed in register RT satisfies 0 < RT < 32. The value in register RT is zero, 


for example, if the corresponding slot in RA is a negative integer. The value in register RT is 32 if the corre- 
sponding slot in register RA is zero. 











for j = 0 to 15 by 4 
t—0 
u — RA^ 
For m = 0 to 31 
If Um = 1 then leave 
t<et+1 
end 
RT4 c t 
end 
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cntb rt,ra 

0 1.0 1.0 1 1. 0 1 0 0 Ill RA RT 
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For each of 16 byte slots: 
* The number of bits in register RA whose value is ‘1’ is computed. 
* The result is placed in register RT. 


Programming Note: The result placed in register RT satisfies 0 < RT x 8. The value in register RT is zero, for 
example, if the value in RA is zero. The value in RT is 8 if the value in RA is -1. 





forj20to 15 
c=0 
b + RA 
Form z0to7 
If ba, 2 1 then c ec +1 
end 
RTicc 
end 











(See also the Form Select Mask for Bytes instruction on page 85.) 
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fsmb rt,ra 


Ill RA 


RT 





0 
Lv vx V v 





yy v 


aj -= 
N|<— O 


l 


nje = 
w| = 


i 


Oj|«— o 
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The rightmost 16 bits of the preferred slot of register RA are used to create a mask in register RT by 
replicating each bit eight times. Bits in the operand are related to bytes in the result in a left-to-right 


correspondence. 





s — RA? 
forj=0to 15 
Ifsj=Othen r < 0x00 
else r! «— OxFF 
end 
RT<r 
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The rightmost 8 bits of the preferred slot of register RA are used to create a mask in register RT by replicating 
each bit 16 times. Bits in the operand are related to halfwords in the result, in a left-to-right correspondence. 





s — RA? 

k=0 

forj=Oto7 
Ifs;=Othen — r*?.— 0x0000 
else 22. OXFFFF 
k=k+2 

end 

RT<r 
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The rightmost 4 bits of the preferred slot of register RA are used to create a mask in register RT by replicating 
each bit 32 times. Bits in the operand are related to words in the result in a left-to-right correspondence. 





S € RAog;31 

k=0 

forj=Oto3 
Ifs;=Othen  r**.— 0x00000000 
else 4 — OxFFFFFFFF 
k=k+4 

end 

RTer 











Version 1.2 


Integer and Logical Instructions 
January 27, 2007 


Page 87 of 278 


SONY 


Instruction Set Architecture €» SONY 


Synergistic Processor Unit 
Gather Bits from Bytes Required v 1.0 
gbb rtra 

Ill RA RT 


0 
l y V vw 4 
10/11 12 13 14 15 16 17/18 19 20 21 22 23 24 25 26 27 28 29 30 31 








Q«— = 


l 


oj = 
[se o 


i 


AÍ«— o 


Yo 


—«— o 


l 




















A 16-bit quantity is formed in the right half of the preferred slot of register RT by concatenating the rightmost 


bit in each byte of register RA. The leftmost 16 bits of register RT are set to zero, as are the remaining slots of 
register RT. 





k=0 

s=0 

for j = 7 to 128 by 8 
Sk & RAj 
k=k+1 

end 

RT? < 0x0000 II s 

RT^7 0 

RTS!! 0 

RT!2:15 <0 











Integer and Logical Instructions Version 1.2 
Page 88 of 278 January 27, 2007 


SONY 


SONY €» Instruction Set Architecture 


Synergistic Processor Unit 
Gather Bits from Halfwords Required v 1.0 
gbh rtra 


Hl RA RT 


O 1 
1 iv y v v 
9 10/11 12 13 14 15 16 17/18 19 20 21 22 23 24,25 26 27 28 29 30 31 








co |«— o 


4 


Oj = 


l 


nje = 
w| = 


i 


Oj|«— o 




















An 8-bit quantity is formed in the rightmost byte of the preferred slot of register RT by concatenating the right- 
most bit in each halfword of register RA. The leftmost 24 bits of the preferred slot of register RT are set to 
Zero, as are the remaining slots of register RT. 





ke0 

s «— 0x00 

for j 2 15 to 128 by 16 
Sk c—RA; 
k «— k- 1 

end 

RT? «— 0x000000 II s 

RT^" — 0 

RTS 0 

RT!2:15 <0 
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A 4-bit quantity is formed in the rightmost 4 bits of register RT by concatenating the rightmost bit in each word 
of register RA. The leftmost 28 bits of register RT are set to zero, as are the remaining slots of register RT. 





k=0 

s = 0x0 

for j = 31 to 128 by 32 
Sk €- RAj 
k — k +1 

end 

RT? < 0x0000000 II s 

RT^' — 0 

RT811 <0 

RT12:15 <0 
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For each of 16 byte slots: 


* The operand from register RA is added to the operand from register RB, and ‘1’ is added to the result. 


These additions are done without loss of precision. 
e That result is shifted to the right by 1 bit and placed in register RT. 











for | =0 to 15 
RT! < ((0x00 II RA!) + (0x00 Il RB’) + 1)7.44 
end 
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absdb rt,ra,rb 

0.0 00 1010 0 1 1 RB RA RT 
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For each of 16 byte slots: 
* The operand in register RA is subtracted from the operand in register RB. 


* The absolute value of the result is placed in register RT. 


Programming Note: The operands are unsigned. 











forj=Oto 15 
if (RB! >" RAY) then RT! — RB! - RA! 
else RT! — RAI - RBI 
end 
Integer and Logical Instructions Version 1.2 


Page 92 of 278 January 27, 2007 


SONY 


SONY €» Instruction Set Architecture 


SRM © 


Synergistic Processor Unit 
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sumb rt,ra,rb 
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For each of four word slots: 


* The 4 bytes in register RB are added, and the 16-bit result is placed in bytes 0 and 1 of register RT. 
* The 4 bytes in register RA are added, and the 16-bit result is placed in bytes 2 and 3 of register RT. 


Programming Note: The operands are unsigned. 


























RT? < RB? + RB'+ RB? + RB? 
RT?3 < RA? + RA'+ RA? + RA? 
RT45 < RB^ + RB°+ RBÓ + RB’ 
RT97 < RA^ + RAS+ RA® + RA? 
RT8? < RB? + RB?.. RB'? + RB" 
RT10-11 < RA? + RA? RAO + RAM 
RT!2:13 — RB"? + RB34 RB'4 + RB? 
RTÍ415 < RAT? + RAÍ34 RA'M + RATS 
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For each of eight halfword slots: 


* The sign of the byte in the right byte of the operand in register RA is propagated to the left byte. 
* The resulting 16-bit integer is stored in register RT. 


Programming Note: This is the only instruction that treats bytes as signed. 





























RTO: < RepLeftBit(RA', 16) 
RT23 < RepLeftBit(RA?, 16) 
RT45 < RepLeftBit(RA®, 16) 
RT&7 < RepLeftBit(RA",16) 
RT8:9 < RepLeftBit(RA?,16) 

RT10:11 e ETA 16) 

RT12:13 < RepLeftBit(RA', 16) 

RT14:15 < RepLeftBit(RA 5,16) 
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For each of four word slots: 


* The sign of the halfword in the right half of the operand in register RA is propagated to the left halfword. 


* The resulting 32-bit integer is placed in register RT. 














RT03 < RepLeftBIt(RA?? 32) 
RT47 < RepLeftBit(RA®”,32) 
RT8:11 < RepLeftBit(RA1™11,32) 
RT12:5 < RepLeftBit(RA 415,32) 
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For each of two doubleword slots: 
* The sign of the word in the right slot is propagated to the left word. 


* The resulting 64-bit integer is stored in register RT. 





RT07 < RepLeftBit(RA^",64) 








RTE < RepLeftBit(RA'""5,64) 
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The values in register RA and register RB are logically ANDed. The result is placed in register RT. 
































Ars < RA®S & RBOS 
lied < RA*? & RB^7 
RT811 <— RA! & np8:11 
RT12:15 p RA1?:15 & Rp!?:15 
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The value in register RA is logically ANDed with the complement of the value in register RB. The result is 
placed in register RT. 






































Are? < RAS & (RB?) 
RT^7 < RA*? & (-RB^7) 
RT! < RAS! & (~RB8:11) 
RT!2:5 e RA12:15 & (-.RB12:15) 
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andbi rt,ra,value 

000101 1 0 110 RA RT 

cy be oe VY vy vv v 
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For each of 16 byte slots, the rightmost 8 bits of the 110 field are ANDed with the value in register RA. The 


result is placed in register RT. 

















b «- 110 & OxO0FF 
bbbb <b Il b Il b Il b 
RTO « RA®3 & bbbb 
RT47 < RA^" & bbbb 
RT! < RA! & pbbb 
RT!2:15 < RAÍ?'15 & bbbb 
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For each of eight halfword slots: 


* The 110 field is extended to 16 bits by replicating its leftmost bit. The result is ANDed with the value in reg- 


ister RA. 
e The 16-bit result is placed in register RT. 























t < RepLeftBit(I10,16) 
pire < RAO &t 
Rie? < RA?3 &t 
RT** e RA*5 &t 
RT®7 < RA97 &t 
RI? < RAS? & t 
R710:11 + RAT?'11 &t 
RT12:13 <— RA'213 & t 
RT!415 <—RA'*15 g t 
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For each of four word slots: 


* The value of the 110 field is extended to 32 bits by replicating its leftmost bit. The result is ANDed with the 


contents of register RA. 
* The result is placed in register RT. 














t < RepLeftBit(I10,32) 
pp < RAPS & t 
RT47 « RA*? &t 
gren e RAS &t 
RT!2:15 <—RA'215 & t 
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The values in register RA and register RB are logically ORed. The result is placed in register RT. 























RT?3 X RAO | RBO:3 
RT^7 & RA“? | RB^"7 
RT811 e- RA8:11 | RB8:11 
RT12:15 Pes RA1?:15 | Rp!?:15 
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The value in register RA is ORed with the complement of the value in register RB. The result is placed in 




















register RT. 
RT®:S < RA?? | (~RB°3) 
RT47 < RA*” | (-RB^7) 
RT8:11 i= RA8:11 | (=~RB8:11) 
RqT12:15 pam RA12:15 | (4RB12"15) 
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orbi rt,ra,value 
0.0000 1 1 0 H0 RA RT 
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For each of 16 byte slots: 
* The rightmost 8 bits of the 110 field are ORed with the value in register RA. 


* The result is placed in register RT. 























b < H0 & OxOOFF 
bbbb <b Ilb Ilb Il b 
RTO3 < RA® | bbbb 
RT^7 < RA^" | bbbb 
RT < RA'! | pbbb 
RT12:15 < RA'15 | bbbb 
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For each of eight halfword slots: 


e The 110 field is extended to 16 bits by replicating its leftmost bit. The result is ORed with the value in reg- 
ister RA. 


* The result is placed in register RT. 


























t < RepLeftBit(110,16) 

gre <— RA? |t 

Are? < RA*S |t 

RT4S «€ RA* |t 

RIS « RAS" |t 

Are? c RA8? |t 
RT10:14 e RA1911 |t 
RT12:13 e RAT?:13 |t 
RT!415 e RAES |t 
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ori rt,ra,value 
000001 0 0 110 RA RT 

ee erp cde Wo vv vx y 
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For each of four word slots: 
e The 110 field is sign-extended to 32 bits and ORed with the contents of register RA. 


* The result is placed in register RT. 




















t < RepLeftBit(110,32) 
RT®3 «— RAO? |t 
RT^7 « RA*7 |t 
gre « RAS! |t 
RT12:15 e RA12:15 |t 
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orx rt,ra 
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The four words of RA are logically ORed. The result is placed in the preferred slot of register RT. The other 


three slots of the register are written with zeros. 





RTO3 e RAO3 | RA“? | RA8:11 | RA12:15 





RT's <0 
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The values in register RA and register RB are logically XORed. The result is placed in register RT. 














ATs < RA?? e RB°3 
pare < RA^" 6 RB*7 
RT811 + RAS! @ np 
RT12:15 + RA'2:15 e np!2:15 











Integer and Logical Instructions 
Page 108 of 278 


Version 1.2 
January 27, 2007 


SONY 


SONY €» Instruction Set Architecture 


Synergistic Processor Unit 


Exclusive Or Byte Immediate Required v 1.0 


xorbi rt,ra,value 


110 RA RT 


0 
14 Vx "am y 
7/8 9 10 11 12 13 14 15 16 17.18 19 20 21 22 23 24/25 26 27 28 29 30 31 











aje = 
Dje = 


t44 




















For each of 16 byte slots: 
* The rightmost 8 bits of the 110 field are XORed with the value in register RA. 


* The result is placed in register RT. 

















b < 110 & OxOOFF 
bbbb <b Il b Il b Il b 
RTO < RA93 © bbbb 
RT47 « RA^" © bbbb 
RT8:11 « RA®"1 @ bbbb 

RT12:15 < RA!?:15 © bbbb 
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Exclusive Or Halfword Immediate Required v 1.0 
xorhi rt,ra,value 

0.1 0 0 O 1 0 1 110 RA RT 

Voy y Y y Y Y vv vu v 
012 3 4 5 6 7/8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24|25 26 27 28 29 30 31 




















For each of eight halfword slots: 


e The 110 field is extended to 16 bits by replicating the leftmost bit. The resulting value is XORed with the 
value in register RA. 


e The 16-bit result is placed in register RT. 























t < RepLeftBit(I10,16) 
RT”! RA” et 
RT#3 — RA*3 Ot 
RE <RA* et 
RT97 — RAS" et 
Are? — RA8? ot 
RT10:11 e RAI 91 
RT12:13 e RAZI et 
RT!415 <—RA'*15 ot 
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Exclusive Or Word Immediate Required v 1.0 


xori rt,ra,value 


110 RA RT 


0 0 
Ha ES a4 i 
6 7/8 9 10 11 12 13 14 15 16 17,18 19 20 21 22 23 24|25 26 27 28 29 30 31 











Oj = 


l 


N€ o 
W\<e— o 


i 


Oj|«— o 




















For each of four word slots: 
e The 110 field is sign-extended to 32 bits and XORed with the contents of register RA. 


e The 32-bit result is placed in register RT. 

















t < RepLeftBit(I10,32) 
RT®3 RA” et 
RIS «— RA*! et 
gre c RA! gr 
RT!2:15 e RAZI @t 
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Nand Required v 1.0 


nand rt,ra,rb 


RB RA RT 


O 1 
Liv V vw 4 
9 10/11 12 13 14 15 16 17/18 19 20 21 22 23 24,25 26 27 28 29 30 31 








l 


N|<— = 


Vy 


we -= 
Al<— = 


l 


=j o 


l 




















For each of four word slots: 
* The complement of the AND of the bit in register RA and the bit in register RB is placed in register RT. 

















RT?S «— -(RA9? & RB”) 
Aire < =(RA*? & RB*”) 
RT811 e -(RAS'1 & RB8:11) 
RT1215 c -(RAT?:15 & RB12:15) 
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Nor Required v 1.0 
nor rt,ra,rb 

0.0 001 00 10 0 1 RB RA RT 

Oy Se ye ee Mode Ae ae Vs Vs j 
0.1 2 34 5 6 7 8 9 10|11 12 13 14 15 16 17/18 19 20 21 22 23 24/25 26 27 28 29 30 31 




















For each of four word slots: 


* The values in register RA and register RB are logically ORed. 











* The result is complemented and placed in register RT. 
Rie? < -(RA9? | RB) 
RT^7 c -(RA^7 | RB^") 
RT811 c -(RAS1! | RB8:11) 
RT!2:15 c -(RA12:15 | RB12:15) 
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Equivalent Required v 1.0 
eqv rt,ra,rb 

0.1 001 001 0 0 1 RB RA RT 

a lis v Vs Vx 4 
012 3 4 5 6 7 8 9 10/11 12 13 14 15 16 17/18 19 20 21 22 23 24|25 26 27 28 29 30 31 




















For each of four word slots: 
* Ifthe bit in register RA and register RB are the same, the result is ‘1’; otherwise, the result is ‘0’. 


* The result is placed in register RT. 














RTOS < RAS e (-RB93) 
pre < RA*" @ (-RB*7) 
RT811 + RA11 [7 (4RB®11) 
RT12:15 = RA12:15 e (4RB12:15) 
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Select Bits Required v 1.0 
selb rt,ra,rb,rc 

1.00 0 RT RB RA RC 
"s vv v E y 
0.1 2 3|4 5 6 7 8 9 10/11 12 13 14 15 16 17/18 19 20 21 22 23 24 25 26 27 28 29 30 31 























A result is formed by using bits from RC to choose corresponding bits either from RA or RB. 


e |f the bit in register RC is ‘0’, then select the bit from register RA; otherwise, select the bit from register 


RB. 


* The selected bits are placed in register RT. 








RTO:15 c— RC9:15 & RBO:15 | (aRC°15) & RA0:15 
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COMPUTER 
SER ARMAS a 








Shuffle Bytes Required v 1.0 
shufb rt,ra,rb,rc 

1 0 1 1 RT RB RA RC 

yos we ecu vv v v3 3 
0.1 2 3/4 5 6 7 8 9 10/11 12 13 14 15 16 17/18 19 20 21 22 23 24 25 26 27 28 29 30 31 























Registers RA and RB are logically concatenated with the least-significant bit of RA adjacent to the most- 
significant bit of RB. The bytes of the resulting value are considered to be numbered from 0 to 31. 


For each byte slot in registers RC and RT: 


* The value in register RC is examined, and a result byte is produced as shown in Table 5-1. 


* The result byte is inserted into register RT. 


Table 5-1. Binary Values in Register RC and Byte Results 























(Expressed hn Binary) Result Byte 
10XXXXXX 0x00 
110xxxxx OxFF 
111xxxxx 0x80 
Otherwise The byte of the concatenated register addressed by the rightmost 5 bits of register RC 








Rconcat <+ RA Il RB 
forj=0to 15 
b — RC 
If bo-4 = 0b10 then c «— 0x00 
else If bo. = 0b110 then c + OxFF 
else If bo. = 0b111 then c < 0x80 
else 
b <b & Ox1F; 
c < Rconcatb; 
RT c 
end 
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6. Shift and Rotate Instructions 


This section describes the SPU shift and rotate instructions. 
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Shift Left Halfword Required v 1.0 
shih rt,ra,rb 

0 00 0 1 O 1 4 1» 1 1 RB RA RT 

voy Yo oy Y v vov Y Vx vs y 
O 1 2 3 4 5 6 7 8 9 10|11 12 13 14 15 16 17/18 19 20 21 22 23 24|25 26 27 28 29 30 31 




















For each of eight halfword slots: 
* The contents of register RA are shifted to the left according to the count in bits 11 to 15 of register RB. 
* The result is placed in register RT. 


e |f the count is zero, the contents of register RA are copied unchanged into register RT. If the count is 
greater than 15, the result is zero. 


* Bits shifted out of the left end of the halfword are discarded; zeros are shifted in at the right. 


Note: Each halfword slot has its own independent shift amount. 





for j = 0 to 15 by 2 
s & RBI? & 0x001F 
te RA? 
for b = 0 to 15 
if b +s < 16 then b € tpi 
else fh «— 0 
end 
RITE? & r 
end 
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Shift Left Halfword Immediate Required v 1.0 


shlhi rt,ra,value 








1 1 
1 iv y v v 
9 10/11 12 13 14 15 16 17/18 19 20 21 22 23 24,25 26 27 28 29 30 31 


oje = 


4 


nie o 
w| o 
A. 
a| = 


i 


Oj|«— o 




















For each of eight halfword slots: 


* The contents of register RA are shifted to the left according to the count in bits 13 to 17 of the I7 field. 
* The result is placed in register RT. 


e |f the count is zero, the contents of register RA are copied unchanged into register RT. If the count is 
greater than 15, the result is zero. 


* Bits shifted out of the left end of the halfword are discarded; zeros are shifted in at the right. 











s — RepLeftBit(I7,16) & 0x001F 
for j = 0 to 15 by 2 
te RAF? 
forb =0to 15 
if b +s < 16 then Tp €- pis 
else rg — 0 
end 
RT? & r 
end 
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Shift Left Word Required v 1.0 
shl rt,ra,rb 

0 00 0 1 O 1 1 O 1 1 RB RA RT 

voy vo oy Y y vov Y Vx vs y 
O 1 2 3 4 5 6 7 8 9 10|11 12 13 14 15 16 17/18 19 20 21 22 23 24|25 26 27 28 29 30 31 




















For each of four word slots: 
* The contents of register RA are shifted to the left according to the count in bits 26 to 31 of register RB. 
* The result is placed in register RT. 


e |f the count is zero, the contents of register RA are copied unchanged into register RT. If the count is 
greater than 31, the result is zero. 


* Bits shifted out of the left end of the word are discarded; zeros are shifted in at the right. 


Note: Each word slot has its own independent shift amount. 





for j = 0 to 15 by 4 
s < RB/^ & 0x0000003F 
te RAM 
for b = 0 to 31 
if b +s < 32 then Tp € tpi 
else fh «— 0 
end 
RT/^«cr 
end 
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Shift Left Word Immediate Required v 1.0 
shli rt,ra,value 

0 00 0 1 1 1 1 O 1 1 I7 RA RT 

oy y VY $ Y Y y Y Y Vy vy v 
0 1 2 3 4 5 6 7 8 9 10|11 12 13 14 15 16 17 18 19 20 21 22 23 24|25 26 27 28 29 30 31 




















For each of four word slots: 
* The contents of register RA are shifted to the left according to the count in bits 12 to 17 of the I7 field. 
* The result is placed in register RT. 


e |f the count is zero, the contents of register RA are copied unchanged into register RT. If the count is 
greater than 31, the result is zero. 


* Bits shifted out of the left end of the word are discarded; zeros are shifted in at the right. 


S <+ RepLeftBit(I7,32) & Ox0000003F 
for j = 0 to 15 by 4 
te RA^ 
for b = 0 to 31 
if b + s < 32 then Tp €- pis 
else rg — 0 
end 
RTH4 & r 
end 
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Shift Left Quadword by Bits Required v 1.0 
shliqbi rt,ra,rb 

RB RA RT 


; 
Lv Vx vow 4 
10/11 12 13 14 15 16 17/18 19 20 21 22 23 24 25 26 27 28 29 30 31 








Oj = 


l 


Dje = 
[sie -= 


i 


Ale = 


t 4 


=j o 


l 




















The contents of register RA are shifted to the left according to the count in bits 29 to 31 of the preferred slot of 
register RB. The result is placed in register RT. A shift of up to 7 bit positions is possible. 


If the count is zero, the contents of register RA are copied unchanged into register RT. 


Bits shifted out of the left end of the register are discarded, and zeros are shifted in at the right. 





S € RB29:31 

for b = 0 to 127 
if b + s < 128 then rp — RAp +s 
else rg +} 0 

end 

RT<r 
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Shift Left Quadword by Bits Immediate Required v 1.0 


shlqbii rt,ra,value 








1 1 
1 iv y v v 
9 10/11 12 13 14 15 16 17/18 19 20 21 22 23 24,25 26 27 28 29 30 31 


co |«— o 


4 


n| = 
Q«— = 
Al<— -= 
Oo «— = 


i 


Oj|«— o 




















The contents of register RA are shifted to the left according to the count in bits 15 to 17 of the I7 field. The 
result is placed in register RT. A shift of up to 7 bit positions is possible. 


If the count is zero, the contents of register RA are copied unchanged into register RT. 


Bits shifted out of the left end of the register are discarded, and zeros are shifted in at the right. 





S < 17 & 0x07 

for b = 0 to 127 
if b + s < 128 then rp — RAp +s 
else fb} 0 

end 

RT<r 
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Shift Left Quadword by Bytes Required v 1.0 
shlqby rt,ra,rb 

RB RA RT 


; 
Lv Vx vw » 
10/11 12 13 14 15 16 17/18 19 20 21 22 23 24 25 26 27 28 29 30 31 








Q«— = 


l 


Dje = 
[sie -= 


i 


Ale = 


4 


2|«— o 


l 




















The bytes of register RA are shifted to the left according to the count in bits 27 to 31 of the preferred slot of 
register RB. The result is placed in register RT. 


If the count is zero, the contents of register RA are copied unchanged into register RT. If the count is greater 
than 15, the result is zero. 


Bytes shifted out of the left end of the register are discarded, and bytes of zeros are shifted in at the right. 





S €- RBo7.31 

for b =0 to 15 
if b +s < 16 then r? + RAP +$ 
else Peo 

end 

RT<r 
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Shift Left Quadword by Bytes Immediate Required v 1.0 


shlqbyi rt,ra,value 








1 1 
1 iv + ¥ v v 
9 10/11 12 13 14 15 16 17/18 19 20 21 22 23 24,25 26 27 28 29 30 31 


c « A 


4 


nje = 
w| = 
B. >= 
a| = 


i 


Oj o 




















The bytes of register RA are shifted to the left according to the count in bits 13 to 17 of the I7 field. The result 
is placed in register RT. 


If the count is zero, the contents of register RA are copied unchanged into register RT. If the count is greater 
than 15, the result is zero. 


Bytes shifted out of the left end of the register are discarded, and zero bytes are shifted in at the right. 





S «— |7 & Ox1F 

for b =0 to 15 
if b +s < 16 then r? + RAP * 
else Peo 

end 

RT<r 
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Shift Left Quadword by Bytes from Bit Shift Count Required v 1.0 
shlqbybi rt,ra,rb 


RB RA RT 


; 
Lv Vx vow » 
10/11 12 13 14 15 16 17/18 19 20 21 22 23 24 25 26 27 28 29 30 31 








Q«— = 


l 


Oj o 
N|<— = 


i 


Ale = 


4 


2|«— o 


l 




















The bytes of register RA are shifted to the left according to the count in bits 24 to 28 of the preferred slot of 
register RB. The result is placed in register RT. 


If the count is zero, the contents of register RA are copied unchanged into register RT. If the count is greater 
than 15, the result is zero. 


Bytes shifted out of the left end of the register are discarded, and bytes of zeros are shifted in at the right. 





S «- RB24:28 
for b =0 to 15 
if b +s < 16 then P? <+ RAP +$ 
else r? — 0x00 
end 
RT er 
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Rotate Halfword Required v 1.0 
roth rt,ra,rb 

0.0001 O 1 1 1 0 0 RB RA RT 

oy Y voy y y Y y v Y Vy yy 4 
0 1 2 3 4 5 6 7 8 9 10|11 12 13 14 15 16 17 18 19 20 21 22 23 24|25 26 27 28 29 30 31 




















For each of eight halfword slots: 
* The contents of register RA are rotated to the left according to the count in bits 12 to 15 of register RB. 
* The result is placed in register RT. 
e |f the count is zero, the contents of register RA are copied unchanged into register RT. 


* Bits rotated out of the left end of the halfword are rotated in at the right end. 


Note: Each halfword slot has its own independent rotate amount. 








for j = 0 to 15 by 2 
s + RBI & 0x000F 
te RA? 
for b =0 to 15 
if b +s < 16 then Tp €- pis 
else lb €- thas-16 
end 
RTF? cr 
end 
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Rotate Halfword Immediate Required v 1.0 
rothi rt,ra,value 

0 00 O 1 1 1 1 1 0 0 I7 RA RT 

Voy $ Y Y Y Y Y v y Y Vy yy 4 
O 1 2 3 4 5 6 7 8 9 10|11 12 13 14 15 16 17/18 19 20 21 22 23 24|25 26 27 28 29 30 31 




















For each of eight halfword slots: 
* The contents of register RA are rotated to the left according to the count in bits 14 to 17 of the I7 field. 
* The result is placed in register RT. 
e |f the count is zero, the contents of register RA are copied unchanged into register RT. 
* Bits rotated out of the left end of the halfword are rotated in at the right end. 














S + RepLeftBit(I7,16) & Ox000F 
for j = 0 to 15 by 2 
t — RAF? 
for b =0 to 15 
if b +s < 16 then Tp «- tp is 
else fp «- 15.5.16 
end 
RITE? & r 
end 
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Rotate Word Required v 1.0 








rot rt,ra,rb 

0.0001 01 10 0 O0 RB RA RT 

Voy ee Ww ow y V Y Vy Vs j 
01 2 3 4 5 6 7 8 9 10|11 12 13 14 15 16 17|18 19 20 21 22 23 24|25 26 27 28 29 30 31 




















For each of four word slots: 
* The contents of register RA are rotated to the left according to the count in bits 27 to 31 of register RB. 
* The result is placed in register RT. 
e |f the count is zero, the contents of register RA are copied unchanged into register RT. 


* Bits rotated out of the left end of the word are rotated in at the right end. 





for j = 0 to 15 by 4 
s «— RBI4 & 0x0000001F 











tc RA^ 
for b = 0 to 31 
if b +s < 32 then Tp € ipis 
else lh & tb +s -32 
end 
RT cr 
end 
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Rotate Word Immediate Required v 1.0 


roti rt,ra,value 





0 
l y Vx vw 4 
10/11 12 13 14 15 16 17/18 19 20 21 22 23 24 25 26 27 28 29 30 31 





Oo «— = 
[sje -= 
0 «— o 
O«— o 


i 


ale a 


4 


—«— o 


l 




















For each of four word slots: 
* The contents of register RA are rotated to the left according to the count in bits 13 to 17 of the I7 field. 
* The result is placed in register RT. 
e |f the count is zero, the contents of register RA are copied unchanged into register RT. 
* Bits rotated out of the left end of the word are rotated in at the right end. 














S < RepLeftBit(I7,32) & 0x0000001F 
for j = 0 to 15 by 4 
te RA^ 
for b = 0 to 31 
if b + s < 32 then Tp € tpi 
else lh <—thag-32 
end 
RT/^« r 
end 
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Rotate Quadword by Bytes Required v 1.0 
rotqby rt,ra,rb 


RB RA RT 


0 0 
1 iv E V v 
9 0/11 12 13 14 15 16 17/18 19 20 21 22 23 24/25 26 27 28 29 30 31 








oje = 


4 


n| = 
we = 
Aj = 
Oo «— o 


i 


O|«— o 





= 

















The bytes in register RA are rotated to the left according to the count in the rightmost 4 bits of the preferred 
slot of register RB. The result is placed in register RT. Rotation of up to 15 byte positions is possible. 


If the count is zero, the contents of register RA are copied unchanged into register RT. 


Bytes rotated out of the left end of the register are rotated in at the right. 














s € RB28:31 
for b = 0 to 15 
if b +s < 16 then °? «+ RAP +$ 
else P e RAP +s- 16 
end 
RT<r 
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Rotate Quadword by Bytes Immediate Required v 1.0 


rotqbyi rt,ra,value 





0 
Vv Vx vw » 
10/11 12 13 14 15 16 17/18 19 20 21 22 23 24 25 26 27 28 29 30 31 





Qj«— o 


l 


Dje = 
[sie -= 
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Ale = 


Yo 


=j o 


l 




















The bytes in register RA are rotated to the left according to the count in the rightmost 4 bits of the I7 field. The 
result is placed in register RT. Rotation of up to 15 byte positions is possible. 


If the count is zero, the contents of register RA are copied unchanged into register RT. 


Bytes rotated out of the left end of the register are rotated in at the right. 





S € 174417 
for b =0 to 15 
if b +s < 16 then P? <+ RAP +$ 
else P e RAP +S- 16 
end 
RT<r 
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Rotate Quadword by Bytes from Bit Shift Count Required v 1.0 
rotqbybi rt,ra,rb 


RB RA RT 


0 0 
1 iv y ty v 
9 0/11 12 13 14 15 16 17/18 19 20 21 22 23 24/25 26 27 28 29 30 31 








o| = 


4 


nje = 
w| -= 
Aj -= 
Oo «— o 


i 


Oj|«— o 





=á 

















The bytes of register RA are rotated to the left according to the count in bits 25 to 28 of the preferred slot of 
register RB. The result is placed in register RT. 


If the count is zero, the contents of register RA are copied unchanged into register RT. 


Bytes rotated out of the left end of the register are rotated in at the right. 














s + RB24:28 
for b = 0 to 15 
if b +s < 16 then P? «+ RAP +$ 
else P e RAP +s- 16 
end 
RT<r 
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Rotate Quadword by Bits Required v 1.0 
rotqbi rt,ra,rb 

RB RA RT 


0 
Lv vx vw 4 
10/11 12 13 14 15 16 17/18 19 20 21 22 23 24/25 26 27 28 29 30 31 








Qj«— o 


l 


Dje = 
sie -= 


i 


Ale = 


Yo 


2|«— o 


l 




















The contents of register RA are rotated to the left according to the count in bits 29 to 31 of the preferred slot 
of register RB. The result is placed in register RT. Rotation of up to 7 bit positions is possible. 


If the count is zero, the contents of register RA are copied unchanged into register RT. 


Bits rotated out at the left end of the register are rotated in at the right. 





S € RB29:31 
for b = 0 to 127 
if b +s < 128 then rp — RAp +s 
else rp — RAb + s - 128 
end 
RT<r 
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Rotate Quadword by Bits Immediate Required v 1.0 


rotqbii rt,ra,value 








0 0 
boue d y ty v 
9 0/11 12 13 14 15 16 17/18 19 20 21 22 23 24/25 26 27 28 29 30 31 


co |«— o 


4 


nje = 
w| = 
Aaj -= 
a| = 
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Oj|«— o 





=á 

















The contents of register RA are rotated to the left according to the count in bits 15 to 17 of the I7 field. The 
result is placed in register RT. Rotation of up to 7 bit positions is possible. 


If the count is zero, the contents of register RA are copied unchanged into register RT. 


Bits rotated out at the left end of the register are rotated in at the right. 





S & l4:6 
for b = 0 to 127 
if b + s < 128 then rp — RAp +s 
else Ih «— RAb +s - 128 
end 
RTer 
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Rotate and Mask Halfword Required v 1.0 








rothm rt,ra,rb 

0.000 10 1 14 1 O0 1 RB RA RT 

poa wey eee pope yy YY 4 
012 3 4 5 6 7 8 9 10/11 12 13 14 15 16 17/18 19 20 21 22 23 24/25 26 27 28 29 30 31 




















For each of eight halfword slots: 
* The shift count is (0 - RB) modulo 32. 


e |f the shift count is less than 16, then RT is set to the contents of RA shifted right shift. count bits, with 
zero fill at the left. 


e Otherwise, RT is set to zero. 


Note: Each halfword slot has its own independent rotate amount. 





for j = 0 to 15 by 2 
s € (0 - RB?) & 0x001F 
te RAF? 
forb =0 to 15 
ifb2sthen rp «tp. s 
else rg «— 0 
end 
ATH? c r 
end 











Programming Note: The Rotate and Mask instructions provide support for a logical right shift, and the 
Rotate and Mask Algebraic instructions provide support for an algebraic right shift. They differ from a conven- 
tional right logical or algebraic shift in that the shift amount accepted by the instructions is the two's comple- 
ment of the right shift amount. Thus, to shift right logically the contents of R2 by the number of bits given in 
R1, the following sequence could be used: 





sfi r3,r1,0 Form two's complement 
rotm r4,r2,r3 Rotate, then mask 


For the immediate forms of these instructions, the formation of the two's complement shift quantity can be 
performed during assembly or compilation. 
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For each of eight halfword slots: 
* The shift count is (0 - I7) modulo 32. 


e |f the shift count is less than 16, then RT is set to the contents of RA shifted right shift. count bits, with 
zero fill at the left. 


* Otherwise, RT is set to zero. 


S + (0 - RepLeftBit(I7,32)) & 0x0000001F 
for j = 0 to 15 by 2 
te RAF? 
forb =0to 15 
if b>s then Tp < tb-s 
else rp <0 
end 
RT? & r 
end 











Programming Note: The Rotate and Mask instructions provide support for a logical right shift, and the 
Rotate and Mask Algebraic instructions provide support for an algebraic right shift. They differ from a conven- 
tional right logical or algebraic shift in that the shift amount accepted by the instructions is the two’s comple- 
ment of the right shift amount. Thus, to shift right logically the contents of R2 by the number of bits given in 
R1, the following sequence could be used: 





sfi r3,r1,0 Form two's complement 
rotm r4,r2,r3 Rotate, then mask 











For the immediate forms of these instructions, the formation of the two’s complement shift quantity can be 
performed during assembly or compilation. 
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For each of four word slots: 
* The shift count is (0 - RB) modulo 64. 


e |f the shift count is less than 32, then RT is set to the contents of RA shifted right shift. count bits, with 
zero fill at the left. 


* Otherwise, RT is set to zero. 


for j = 0 to 15 by 4 
s < (0 - RBI4) & 0x0000003F 
te RAM 
for b = 0 to 31 
ifb>sthen  rp«—tp.s 
else rp <0 
end 
RT4 cr 
end 











Programming Note: The Rotate and Mask instructions provide support for a logical right shift, and the 
Rotate and Mask Algebraic instructions provide support for an algebraic right shift. They differ from a conven- 
tional right logical or algebraic shift in that the shift amount accepted by the instructions is the two’s comple- 
ment of the right shift amount. Thus, to shift right logically the contents of R2 by the number of bits given in 
R1, the following sequence could be used: 





sfi r3,r1,0 Form two’s complement 
rotm r4,r2,r3 Rotate, then mask 











For the immediate forms of these instructions, the formation of the two’s complement shift quantity can be 
performed during assembly or compilation. 
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For each of four word slots: 
* The shift count is (0 - I7) modulo 64. 


e |f the shift count is less than 32, then RT is set to the contents of RA shifted right shift. count bits, with 
zero fill at the left. 


* Otherwise, RT is set to zero. 


S + (0 - RepLeftBit(I7,32)) & Ox0000003F 
for j = 0 to 15 by 4 
te RAF4 
for b = 0 to 31 
ifb>sthen  rp«—tp.s 
else rp <0 
end 
RT4 & r 
end 











Programming Note: The Rotate and Mask instructions provide support for a logical right shift, and the 
Rotate and Mask Algebraic instructions provide support for an algebraic right shift. They differ from a conven- 
tional right logical or algebraic shift in that the shift amount accepted by the instructions is the two’s comple- 
ment of the right shift amount. Thus, to shift right logically the contents of R2 by the number of bits given in 
R1, the following sequence could be used: 





sfi r3,r1,0 Form two's complement 
rotm r4,r2,r3 Rotate, then mask 











For the immediate forms of these instructions, the formation of the two’s complement shift quantity can be 
performed during assembly or compilation. 
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The shift_count is (0 - the preferred word of RB) modulo 32. If the shift_count is less than 16, then RT is set to 


the contents of RA shifted right shift count bytes, filling at the left with 0x00 bytes. Otherwise, RT is set to 
zero. 





S «c (0 - RBo7-31) & Ox1F 
forbz Oto 15 
if b > s then Petrs 
else i? «— 0x00 
end 
RT<r 
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The shift. count is (0 - I7) modulo 32. If the shift count is less than 16, then RT is set to the contents of RA 
shifted right shift count bytes, filling at the left with 0x00 bytes. Otherwise, all bytes of RT are set to 0x00. 














s — (0-17) & OX1F 
for b = 0 to 15 
if b > s then Pes 
else r? — 0x00 
end 
RT<r 
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The shift_count is (0 minus bits 24 to 28 of RB) modulo 32. If the shift_count is less than 16, then RT is set to 
the contents of RA, which is shifted right shift_count bytes, and filled at the left with 0x00 bytes. Otherwise, all 
bytes of RT are set to 0x00. 





S «c (0 - RB24:28) & Ox1F 
forb = Oto 15 
if b > s then i? «— RAPS 


else i? «— 0x00 
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The shift count is (0 - the preferred word of RB) modulo 8. RT is set to the contents of RA, shifted right by 
shift count bits, filling at the left with zero bits. 














se (0 - RB2g:31) & 0x07 
for b = 0 to 127 
if b 2 s then Tp & lp. s 
else <0 
end 
RT<r 
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The shift count is (0 - I7) modulo 8. RT is set to the contents of RA, shifted right by shift count bits, filling at 
the left with zero bits. 











S <+ (0-17) & 0x07 
forb = 0 to 127 
if b 2 s then Ip «— lp. s 
else rg «— O 
end 
RT<r 
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For each of eight halfword slots: 
* The shift count is (0 - RB) modulo 32. 


e |f the shift count is less than 16, then RT is set to the contents of RA shifted right shift. count bits, repli- 
cating bit O (of the halfword) at the left. 


* Otherwise, all bits of this halfword of RT are set to bit O of this halfword of RA. 


Note: Each halfword slot has its own independent rotate amount. 





for j = 0 to 15 by 2 

s < (0 - RB?) & 0x001F 

t< RAÏ? 

forb =0to 15 
ifb2sthen rb&tb-s 
else Tp € tp 

end 

RTM er 





end 
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l1 lis v Vy vy v 
012 3 4 5 6 7 8 9 10/11 12 13 14 15 16 17/18 19 20 21 22 23 24|25 26 27 28 29 30 31 




















For each of eight halfword slots: 
* The shift count is (0 - I7) modulo 32. 


e |f the shift count is less than 16, then RT is set to the contents of RA shifted right shift. count bits, repli- 
cating bit O (of the halfword) at the left. 


* Otherwise, all bits of this halfword of RT are set to bit O of this halfword of RA. 











S < (0 - RepLeftBit(I7,16)) & Ox001F 
for j = 0 to 15 by 2 
t e RAM? 
forb =0to 15 
ifb>sthen  rp«—tp.s 
else Tp & to 
end 
RT? © r 
end 
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For each of four word slots: 
e The shift_count is (0 - RB) modulo 64. 


* |f the shift count is less than 32, then RT is set to the contents of RA shifted right shift. count bits, repli- 
cating bit O (of the word) at the left. 


* Otherwise, all bits of this word of RT are set to bit O of this word of RA. 


for j = 0 to 15 by 4 
s < (0 - RBI*) & 0x0000003F 
te RAM 
for b = 0 to 31 
ifb>sthen  rp«—tp.s 
else Tp < to 
end 
RT4 or 
end 
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For each of four word slots: 
* The shift count is (0 - I7) modulo 64. 


e |f the shift count is less than 32, then RT is set to the contents of RA shifted right shift. count bits, repli- 
cating bit O (of the word) at the left. 


* Otherwise, all bits of this word of RT are set to bit O of this word of RA. 











S + (0 - RepLeftBit(I7,32)) & Ox0000003F 
for j = 0 to 15 by 4 
te RAF^4 
for b = 0 to 31 
ifb>sthen  rp«—tp.s 
else Ip & to 
end 
RTH4 & r 
end 
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7. Compare, Branch, and Halt Instructions 


This section lists and describes the SPU compare, branch, and halt instructions. For more information about 
the SPU interrupt facility, see Section 12 on page 251. 


Conditional branch instructions operate by examining a value in a register, rather than by accessing a 
specialized condition code register. The value is taken from the preferred slot. It is usually set by a compare 
instruction. 


Compare instructions perform a comparison of the values in two registers or a value in a register and an 
immediate value. The result is indicated by setting into the target register a result value that is the same width 
as the register operands. If the comparison condition is met, the value is all one bits; if not, the value is all 
zero bits. 


Logical comparison instructions treat the operands as unsigned integers. Other compare instructions treat the 
operands as two's complement signed integers. 


A set of halt instructions is provided that stops execution when the tested condition is met. These are 
intended to be used, for example, to check addresses or subscript ranges in situations where failure to meet 
the condition is regarded as a serious error. The stop that occurs is not precise; as a result, execution can 
generally not be restarted. 


Floating-point compare instructions are listed in Section 9 Floating-Point Instructions on page 195 with the 
other floating-point instructions. 
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The value in the preferred slot of register RA is compared with the value in the preferred slot of register RB. If 
the values are equal, execution of the program stops at or after the halt. 


Programming Note: RT is a false target. Implementations can schedule instructions as though this instruc- 
tion produces a value into RT. Programs can avoid unnecessary delay by programming RT so as not to 
appear to source data for nearby subsequent instructions. False targets are not written. 


If RA% = RB® then 
Stop after executing zero or more instructions after the halt. 
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The value in the 110 field is extended to 32 bits by replicating the leftmost bit. The result is compared to the 
value in the preferred slot of register RA. If the value from register RA is equal to the immediate value, execu- 
tion of the SPU program stops at or after the halt instruction. 


Programming Note: RT is a false target. Implementations can schedule instructions as though this instruc- 
tion produces a value into RT. Programs can avoid unnecessary delay by programming RT so as not to 
appear to source data for nearby subsequent instructions. False targets are not written. 





If RA93 = RepLeftBit(110,32) then 
Stop after executing zero or more instructions after the halt. 
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The value in the preferred slot of register RA is algebraically compared with the value in the preferred slot of 
register RB. If the value from register RA is greater than the RB value, execution of the SPU program stops at 
or after the halt instruction. 


Programming Note: RT is a false target. Implementations can schedule instructions as though this instruc- 
tion produces a value into RT. Programs can avoid unnecessary delay by programming RT so as not to 
appear to source data for nearby subsequent instructions. False targets are not written. 





If RA% > RB®3 then 
Stop after executing zero or more instructions after the halt. 
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The value in the 110 field is extended to 32 bits by replicating the leftmost bit. The result is algebraically 
compared to the value in the preferred slot of register RA. If the value from register RA is greater than the 
immediate value, execution of the SPU program stops at or after the halt instruction. 


Programming Note: RT is a false target. Implementations can schedule instructions as though this instruc- 
tion produces a value into RT. Programs can avoid unnecessary delay by programming RT so as not to 
appear to source data for nearby subsequent instructions. False targets are not written. 





If RA93 > RepLeftBit(110,32) then 
Stop after executing zero or more instructions after the halt. 
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The value in the preferred slot of register RA is logically compared with the value in the preferred slot of 
register RB. If the value from register RA is greater than the value from register RB, execution of the SPU 
program stops at or after the halt instruction. 


Programming Note: RT is a false target. Implementations can schedule instructions as though this instruc- 
tion produces a value into RT. Programs can avoid unnecessary delay by programming RT so as not to 
appear to source data for nearby subsequent instructions. False targets are not written. 





If RA? >! RBOS then 
Stop after executing zero or more instructions after the halt. 
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Programming Note: RT is a false target. Implementations can schedule instructions as though this instruc- 
tion produces a value into RT. Programs can avoid unnecessary delay by programming RT so as not to 
appear to source data for nearby subsequent instructions. False targets are not written. 





If RA93 >" RepLeftBit(110,32) then 
Stop after executing zero or more instructions after the halt. 
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For each of 16 byte slots: 


* The operand from register RA is compared with the operand from register RB. If the operands are equal, 
a result of all one bits (true) is produced. If they are unequal, a result of all zero bits (false) is produced. 


* The 8-bit result is placed in register RT. 





fori=Oto 15 
If RA! = RB! then RT! + OxFF 
else RT! — 0x00 
end 
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For each of 16 byte slots: 


* The value in the rightmost 8 bits of the 110 field is compared with the value in register RA. If the two val- 
ues are equal, a result of all one bits (true) is produced. If they are unequal, a result of all zero bits (false) 
is produced. 


* The 8-bit result is placed in register RT. 














fori=Oto 15 
If RA! = 1105.9 then RT! < OxFF 
else RT! < 0x00 
end 
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For each of 8 halfword slots: 


* The operand from register RA is compared with the operand from register RB. If the operands are equal, 
a result of all one bits (true) is produced. If they are unequal, a result of all zero bits (false) is produced. 


e The 16-bit result is placed in register RT. 





for i = 0 to 15 by 2 
If RAÏ? = RB"? then RT"? — OxFFFF 
else RT 2 «— 0x0000 
end 
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For each of eight halfword slots: 


* The value in the 110 field is extended to 16 bits by replicating its leftmost bit and compared with the value 
in register RA. If the two values are equal, a result of all one bits (true) is produced. If they are unequal, a 
result of all zero bits (false) is produced. 


e The 16-bit result is placed in register RT. 





for i = 0 to 15 by 2 
If RA"? = RepLeftBit(110,16) then RT"? — OxFFFF 
else RT? «— 0x0000 
end 
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For each of four word slots: 


* The operand from register RA is compared with the operand from register RB. If the operands are equal, 
a result of all one bits (true) is produced. If they are unequal, a result of all zero bits (false) is produced. 


e The 32-bit result is placed in register RT. 





for i = 0 to 15 by 4 
If RA*^ = RBM then = RT? — OxFFFFFFFF 
else RT 4 «— 0x00000000 
end 
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For each of four word slots: 


e The 110 field is extended to 32 bits by replicating its leftmost bit and comparing it with the value in register 
RA. If the two values are equal, a result of all one bits (true) is produced. If they are unequal, a result of all 
zero bits (false) is produced. 


e The 32-bit result is placed in register RT. 





for i = 0 to 15 by 4 
If RA*^ = RepLeftBit(110,32) then RT^ — OXFFFFFFFF 
else RT 4 «— 0x00000000 





end 
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For each of 16 byte slots: 


* The operand from register RA is algebraically compared with the operand from register RB. If the oper- 
and in register RA is greater than the operand in register RB, a result of all one bits (true) is produced. 
Otherwise, a result of all zero bits (false) is produced. 


* The 8-bit result is placed in register RT. 





fori=Oto 15 
If RA! > RB! then RT! < OxFF 
else RT! — 0x00 
end 
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For each of 16 byte slots: 


* The value in the rightmost 8 bits of the 110 field is algebraically compared with the value in register RA. If 
the value in register RA is greater, a result of all one bits (true) is produced. Otherwise, a result of all zero 
bits (false) is produced. 


* The 8-bit result is placed in register RT. 














fori=Oto 15 
If RA! > 1105-9 then RT! < OxFF 
else RT! < 0x00 
end 
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For each of 8 halfword slots: 


* The operand from register RA is algebraically compared with the operand from register RB. If the oper- 
and in register RA is greater than the operand in register RB, a result of all one bits (true) is produced. 
Otherwise, a result of all zero bits (false) is produced. 


e The 16-bit result is placed in register RT. 





for i = 0 to 15 by 2 
If RA? > RB? then RT*? © OxFFFF 
else RT? «— 0x0000 
end 
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For each of eight halfword slots: 


e The value in the 110 field is extended to 16 bits and algebraically compared with the value in register RA. 
If the value in register RA is greater than the 110 value, a result of all one bits (true) is produced. Other- 
wise, a result of all zero bits (false) is produced. 


e The 16-bit result is placed in register RT. 





for i = 0 to 15 by 2 
If RA"? > RepLeftBit(110,16) then RT? «— OxFFFF 
else RT 2 «— 0x0000 
end 
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For each of four word slots: 


* The operand from register RA is algebraically compared with the operand from register RB. If the oper- 
and in register RA is greater than the operand in register RB, a result of all one bits (true) is produced. 
Otherwise, a result of all zero bits (false) is produced. 


e The 32-bit result is placed in register RT. 





for i = 0 to 15 by 4 
If RA*^ > RB’ then RT"^ «— OxFFFFFFFF 
else RT*4 «— 0x00000000 
end 
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For each of four word slots: 


* The value in the 110 field is extended to 32 bits by sign extension and algebraically compared with the 
value in register RA. If the value in register RA is greater than the 110 value, a result of all one bits (true) 
is produced. Otherwise, a result of all zero bits (false) is produced. 


* The 32-bit result is placed in register RT. 





for i = 0 to 15 by 4 
If RA*^ > RepLeftBit(110,32) then RT^ — OXFFFFFFFF 
else RT 4 «— 0x00000000 





end 
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For each of 16 byte slots: 


* The operand from register RA is logically compared with the operand from register RB. If the operand in 
register RA is logically greater than the operand in register RB, a result of all one bits (true) is produced. 
Otherwise, a result of all zero bits (false) is produced. 


* The 8-bit result is placed in register RT. 





fori=Oto 15 
If RA! >" RB! thenRT! < OxFF 
else RT! — 0x00 
end 
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For each of 16 byte slots: 


* The value in the rightmost 8 bits of the 110 field is logically compared with the value in register RA. If the 
value in register RA is logically greater, a result of all one bits (true) is produced. Otherwise, a result of all 
zero (false) bits is produced. 


* The 8-bit result is placed in register RT. 





fori=Oto 15 
If RA >" 1105.9 then — RT! < OxFF 
else RT! — 0x00 
end 
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For each of eight halfword slots: 


* The operand from register RA is logically compared with the operand from register RB. If the operand in 
register RA is logically greater than the operand in register RB, a result of all one bits (true) is produced. 
Otherwise, a result of all zero bits (false) is produced. 


e The 16-bit result is placed in register RT. 





for i = 0 to 15 by 2 
If RAM? >u RB then RT"? «— OxFFFF 
else RT? «— 0x0000 
end 
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For each of eight halfword slots: 


* The value in the 110 field is extended to 16 bits by replicating the leftmost bit and logically compared with 
the value in register RA. If the value in register RA is logically greater than the 110 value, a result of all one 
bits (true) is produced. Otherwise, a result of all zero bits (false) is produced. 


e The 16-bit result is placed in register RT. 





for i = 0 to 15 by 2 
If RA"? >" RepLeftBit(110,16) then RTE? «— OxFFFF 
else RTH? «— 0x0000 
end 
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For each of four word slots: 


* The operand from register RA is logically compared with the operand from register RB. If the operand in 
register RA is logically greater than the operand in register RB, a result of all one bits (true) is produced. 
Otherwise, a result of all zero bits (false) is produced. 


e The 32-bit result is placed in register RT. 





for i = 0 to 15 by 4 
If RA“ >" RB^^ then RT4 — OxFFFFFFFF 
else RT" «— 0x00000000 





end 
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For each of four word slots: 


* The value in the 110 field is extended to 32 bits by sign extension and logically compared with the value in 
register RA. If the value in register RA is logically greater than the 110 value, a result of all one bits (true) 
is produced. Otherwise, a result of all zero bits (false) is produced. 


* The 32-bit result is placed in register RT. 





for i = 0 to 15 by 4 
If RA*^ 54 RepLeftBit(110,32) then RT'^ «— OxFFFFFFFF 
else RT 4 «— 0x00000000 
end 
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Execution proceeds with the target instruction. The address of the target instruction is computed by adding 
the value of the 116 field, extended on the right with two zero bits with the result treated as a signed quantity, 
to the address of the Branch Relative instruction. 


Programming Note: If the value of the 116 field is zero, an infinite one instruction loop is executed. 





PC < (PC + RepLeftBit(I16 I| 0b00,32)) & LSLR 
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Execution proceeds with the target instruction. The address of the target instruction is the value of the 116 
field, extended on the right with two zero bits and extended on the left with copies of the most-significant bit. 





PC < RepLeftBit(I16 I| 0500,32) & LSLR 
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Execution proceeds with the target instruction. In addition, a link register is set. 


The address of the target instruction is computed by adding the value of the 116 field, extended on the right 
with two zero bits with the result treated as a signed quantity, to the address of the Branch Relative and Set 
Link instruction. 


The preferred slot of register RT is set to the address of the byte following the Branch Relative and Set Link 
instruction. The remaining slots of register RT are set to zero. 


Programming Note: If the value of the 116 field is zero, an infinite one instruction loop is executed. 




















RTOS < (PC +4) & LSLR 
Hrs <0 
PC < (PC + RepLeftBit(I16 I| 0b00,32)) & LSLR 
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Execution proceeds with the target instruction. In addition, a link register is set. 


The address of the target instruction is the value of the 116 field, extended on the right with two zero bits and 
extended on the left with copies of the most-significant bit. 


The preferred slot of register RT is set to the address of the byte following the Branch Absolute and Set Link 
instruction. The remaining slots of register RT are set to zero. 




















RTOS < (PC +4) & LSLR 
Rete <0 
PC < RepLeftBit(I16 || 0b00,32) & LSLR 
Version 1.2 Compare, Branch, and Halt Instructions 


January 27, 2007 Page 177 of 278 


SONY 


Instruction Set Architecture €» SONY 


M 


Synergistic Processor Unit 








Branch Indirect Required v 1.0 
bi ra 

0.0 1 1.01.01 000 / D E / / / I RA Ill 

VoY $ eee Y Y Y * * * Y V Y y v V vy t 
0.1 2 3 4 5 6 7 8 9 10/11[|12|13/14/15|16|17 18 19 20 21 22 23 24/25 26 27 28 29 30 31 






































Execution proceeds with the instruction addressed by the preferred slot of register RA. The rightmost 2 bits of 
the value in register RA are ignored and assumed to be zero. Interrupts can be enabled or disabled with the E 
or D feature bits (see Section 12 SPU Interrupt Facility on page 251). 





PC < RA?? & LSLR & OxFFFFFFFC 


if (E = 0 and D = 0) then interrupt enable status is not modified 
else if (E = 1 and D = 0) then enable interrupts at target 

else if (E = 0 and D = 1) then disable interrupts at target 

else if (E = 1 and D = 1) then reserved 
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Execution proceeds with the instruction addressed by SRRO. RA is considered to be a valid source whose 
value is ignored. Interrupts can be enabled or disabled with the E or D feature bits (see Section 12 SPU Inter- 
rupt Facility on page 251). 





PC < SRRO 


if (E = 0 and D = 0) then interrupt enable status is not modified 
else if (E = 1 and D = 0) then enable interrupts at target 

else if (E = 0 and D = 1) then disable interrupts at target 

else if (E = 1 and D = 1) then reserved 
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The external condition is examined. If it is false, execution continues with the next sequential instruction. If the 
external condition is true, the effective address of the next instruction is taken from the preferred word slot of 
register RA. 


The address of the instruction following the bisled instruction is placed into the preferred word slot of register 
RT; the remainder of register RT is set to zero. 


If the branch is taken, interrupts can be enabled or disabled with the E or D feature bits (see Section 12 SPU 
Interrupt Facility on page 251). 





u € LSLR & (PC + 4) 

t — RA?? & LSLR & OxFFFFFFFC 
RT°3 <u 

RT*15 — 0 


if (external event) then 
PC &t 
if (E = 0 and D = 0) then interrupt enable status is not modified 
else if (E = 1 and D = 0) then enable interrupts at target 
else if (E = 0 and D = 1) then disable interrupts at target 
else if (E = 1 and D = 1) then reserved 
else 
PC <u 
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The effective address of the next instruction is taken from the preferred word slot of register RA, with the 
rightmost 2 bits assumed to be zero. The address of the instruction following the bisl instruction is placed into 
the preferred word slot of register RT. The remainder of register RT is set to zero. Interrupts can be enabled 
or disabled with the E or D feature bits (see Section 12 SPU Interrupt Facility on page 251). 





t — RA?? & LSLR & OXFFFFFFFC 
u € LSLR & (PC + 4) 

RT? <u 

RT*15 < 0x00 

PC <t 


if (E = 0 and D = 0) then interrupt enable status is not modified 
else if (E = 1 and D = 0) then enable interrupts at target 

else if (E = 0 and D = 1) then disable interrupts at target 

else if (E = 1 and D = 1) then reserved 
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Examine the preferred slot; if it is not zero, proceed with the branch target. Otherwise, proceed with the next 
instruction. 


The address of the branch target is computed by appending two zero bits to the value of the 116 field, 
extending it on the left with copies of the most-significant bit, and adding it to the value of the instruction 
counter. 





If RTO3 2 0 then 
PC < (PC + RepLeftBit(I16 Il 0b00)) & LSLR & OxFFFFFFFC 


else 
PC <+ (PC+4) & LSLR 
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Examine the preferred slot. If it is zero, proceed with the branch target. Otherwise, proceed with the 
next instruction. 


The address of the branch target is computed by appending two zero bits to the value of the 116 field, 
extending it on the left with copies of the most-significant bit, and adding it to the value of the 
instruction counter. 


If RT93 = 0 then 
PC < (PC + RepLeftBit(I16 Il 0b00)) & LSLR & OxFFFFFFFC 


else 





PC « (PC +4) & LSLR 
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Examine the preferred slot. If the rightmost halfword is not zero, proceed with the branch target. Otherwise, 
proceed with the next instruction. 


The address of the branch target is computed by appending two zero bits to the value of the 116 field, 
extending it on the left with copies of the most-significant bit, and adding it to the value of the instruction 
counter. 





If RT?3 2 0 then 
PC < (PC + RepLeftBit(I16 Il 0b00)) & LSLR & OxFFFFFFFC 


else 
PC — (PC + 4) & LSLR 
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Examine the preferred slot. If the rightmost halfword is zero, proceed with the branch target. Otherwise, 
proceed with the next instruction. 


The address of the branch target is computed by appending two zero bits to the value of the 116 field, 
extending it on the left with copies of the most-significant bit, and adding it to the value of the instruction 
counter. 


If RT?3 = 0 then 
PC < (PC + RepLeftBit(I16 Il 0b00)) & LSLR & OxFFFFFFFC 


else 





PC « (PC +4) & LSLR 
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If the preferred slot of register RT is not zero, execution proceeds with the next sequential instruction. Other- 
wise, execution proceeds at the address in the preferred slot of register RA, treating the rightmost 2 bits as 
zero. If the branch is taken, interrupts can be enabled or disabled with the E or D feature bits (see Section 12 
SPU Interrupt Facility on page 251). 





t — RA°S & LSLR & OxFFFFFFFC 
u € LSLR & (PC + 4) 


If RT3 = 0 then 
PC <t&LSLR & OxFFFF FFFC 
if (E = 0 and D = 0) then interrupt enable status is not modified 
else if (E = 1 and D = 0) then enable interrupts at target 
else if (E = 0 and D = 1) then disable interrupts at target 
else if (E = 1 and D = 1) then reserved 
else 
PC <u 
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If the preferred slot of register RT is zero, execution proceeds with the next sequential instruction. Otherwise, 
execution proceeds at the address in the preferred slot of register RA, treating the rightmost 2 bits as zero. If 
the branch is taken, interrupts can be enabled or disabled with the E or D feature bits (see Section 12 SPU 
Interrupt Facility on page 251). 





t — RA?? & LSLR & OXFFFFFFFC 
u € LSLR & (PC + 4) 


If RTS 1- 0 then 
PC <+ t & LSLR & OXFFFFFFFC 
if (E = 0 and D = 0) then interrupt enable status is not modified 
else if (E = 1 and D = 0) then enable interrupts at target 
else if (E = 0 and D = 1) then disable interrupts at target 
else if (E = 1 and D = 1) then reserved 
else 
PC <u 
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If the rightmost halfword of the preferred slot of register RT is not zero, execution proceeds with the next 
sequential instruction. Otherwise, execution proceeds at the address in the preferred slot of register RA, 
treating the rightmost 2 bits as zero. If the branch is taken, interrupts can be enabled or disabled with the E or 
D feature bits (see Section 12 SPU Interrupt Facility on page 251). 





t — RA°S & LSLR & OxFFFFFFFC 
u<LSLR & (PC + 4) 


If RT?3 = 0 then do 
PC <+ t & LSLR & OXFFFFFFFC 
if (E = 0 and D = 0) then interrupt enable status is not modified 
else if (E = 1 and D = 0) then enable interrupts at target 
else if (E = 0 and D = 1) then disable interrupts at target 
else if (E = 1 and D = 1) then reserved 
else 
PC <u 
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If the rightmost halfword of the preferred slot of register RT is zero, execution proceeds with the next sequen- 
tial instruction. Otherwise, execution proceeds at the address in the preferred slot of register RA, treating the 
rightmost 2 bits as zero. If the branch is taken, interrupts can be enabled or disabled with the E or D feature 
bits (see Section 12 SPU Interrupt Facility on page 251). 





t — RA?? & LSLR & OxFFFFFFFC 
u €- LSLR & (PC + 4) 


If RT?3 1- 0 then 
PC <+ t & LSLR & OXFFFFFFFC 
if (E = 0 and D = 0) then interrupt enable status is not modified 
else if (E = 1 and D = 0) then enable interrupts at target 
else if (E = 0 and D = 1) then disable interrupts at target 
else if (E = 1 and D = 1) then reserved 
else 





PC <u 
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8. Hint-for-Branch Instructions 


This section lists and describes the SPU hint-for-branch instructions. 


These instructions have no semantics. They provide a hint to the implementation about a future branch 
instruction, with the intention that the information be used to improve performance by either prefetching the 
branch target or by other means. 


Each of the hint-for-branch instructions specifies the address of a branch instruction and the address of the 
expected branch target address. If the expectation is that the branch is not taken, the target address is the 
address of the instruction following the branch. 


The instructions in this section use the variables brinst and brtarg, which are defined as follows: 


* brinst - RO 
* brtarg = 116 
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The address of the branch target is given by the contents of the preferred slot of register RA. The RO field 
gives the signed word offset from the hbr instruction to the branch instruction. 


If the P feature bit is set, hbr does not hint a branch. Instead, it hints that this is the proper implementation- 
specific moment to perform inline prefetching. Inline prefetching is the instruction fetch function necessary to 
run linearly sequential program text. To obtain optimal performance, some implementations of the SPU may 
require help scheduling these inline prefetches of local storage when the program is also doing loads and 
stores. See the implementation-specific SPU documentation for information about when this might be benefi- 
cial. When the P feature bit is set, the instruction ignores the value of RA. The relative offset (RO) field, 
formed by concatenating ROH (high) and ROL (low), must be set to zero. 


branch target address + RAO? & LSLR & OxFFFFFFFC 
branch instruction address <+ (RepLeftBit(ROH Il ROL Il 0b00,32) + PC) & LSLR 





Hint-for-Branch Instructions Version 1.2 
Page 192 of 278 January 27, 2007 


SONY 


SONY €» Instruction Set Architecture 


Synergistic Processor Unit 








Hint for Branch (a-form) Required v 1.0 
hbra brinst,brtarg 

0 0 O 1 O O O ROH 116 ROL 

ee ee ee vy Y 
0 12 3 4 5 67 8,9 10 11 12 18 14 15 16 17 18 19 20 21 22 23 24/25 26 27 28 29 30 31 




















The address of the branch target is specified by an address in the 116 field. The value has 2 bits of zero 
appended on the right before it is used. 


The RO field, formed by concatenating ROH (high) and ROL (low), gives the signed word offset from the hbra 
instruction to the branch instruction. 





branch target address < RepLeftBit(I16 || 0500,32) & LSLR 
branch instruction address <+ (RepLeftBit(ROH Il ROL Il 0500,32) + PC) & LSLR 
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Hint for Branch Relative Required v 1.0 
hbrr brinst,brtarg 

0 0 O 1 O O 1 ROH 116 ROL 
meee ee *o* vv Y 
0 1 2 3 4 5 67 8,9 10 11 12 18 14 15 16 17 18 19 20 21 22 23 24/25 26 27 28 29 30 31 




















The address of the branch target is specified by a word offset given in the 116 field. The signed 116 field is 
added to the address of the hbrr instruction to determine the absolute address of the branch target. 


The RO field, formed by concatenating ROH (high) and ROL (low), gives the signed word offset from the hbrr 
instruction to the branch instruction. 





branch target address <+ (RepLeftBit(I16 || 0500,32) + PC) & LSLR 
branch instruction address <+ (RepLeftBit(ROH Il ROL Il 0500,32) + PC) & LSLR 
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9. Floating-Point Instructions 


This section describes the SPU floating-point instructions. This section also describes the differences 
between SPU floating-point calculations and IEEE standard floating-point calculations. The single-precision, 
floating-point instructions do not calculate results compliant with /EEE Standard 754. However, the data 
formats for single-precision and double-precision floating-point numbers used in the SPU are the same as the 
IEEE Standard 754. 


Implementation Note: The architecture allows implementations to produce different results for floating-point 
instructions. See the implementation-specific documentation for information about the results produced by an 
implementation. To achieve the same results between implementations requires more than architectural com- 
pliance. 


9.1 Single Precision (Extended-Range Mode) 


For single-precision operations, the range of normalized numbers is extended. However, the full range 
defined in the standard is not implemented. The range of nonzero numbers that can be represented and 
operated on in the SPU is between the minimum and maximum listed in Table 9-1. Table 9-1 also demon- 
strates converting from a register value to a decimal value. 


Table 9-1. Single-Precision (Extended-Range Mode) Minimum and Maximum Values 















































Minimum Positive Maximum Positive 
Number Format Magnitude (Smin) Magnitude (Smax) Notes 
Register Value 0x00800000 Ox7FFFFFFF 
Sign 8-Bit Biased | Fraction (implied Sign 8-Bit Biased | Fraction (implied 1 
Bit Fields 9 Exponent [1] and 23 bits) 9 Exponent [1] and 23 bits) 
0 00000001 [1.]000...000 0 11111111 [1.]111...111 
Value in Powers of 2 + a(t - 127) 1 + 2(255 - 127) 2-23 2 
Combined Exponent and Fraction 27126 * (4.1) 2128 * (412 - 2°23) 
Value of Register in Decimal 1.2* 10°38 6.8 * 1038 
Notes: 
1. The exponent field is biased by +127. 
2. The value 2 - 2?? is one least significant bit (LSb) less than 2. 














Zero has two representations: 
* Fora positive zero, all bits are zero; that is, the sign, exponent, and fraction are zero. 


* Fora negative zero, the sign is one; that is, the exponent and fraction are zero. 
As inputs, both kinds of zero are supported; however, a zero result is always a positive zero. 


Single-precision operations in the SPU have the following characteristics: 
* Not a Number (NaN) is not supported as an operand and is not produced as a result. 


* Infinity (Inf) is not supported. An operation that produces a magnitude greater than the largest number 
representable in the target floating-point format instead produces a number with the appropriate sign, the 
largest biased exponent, and a magnitude of all (binary) ones. It is important to note that the representa- 
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tion of Inf, which conforms to the IEEE standard, is interpreted by the SPU as a number that is smaller 
than the largest number used on the SPU. 


* Denorms are not supported and are treated as zero. Thus, an operation that would generate a denorm 
under IEEE rules instead generates a positive zero. If a denorm is used as an operand, it is treated as a 
zero. 


* The only supported rounding mode is truncation (toward zero). 
For single-precision extended-range arithmetic, four kinds of exception conditions are tested: overflow, 
underflow, divide-by-zero, and IEEE noncompliant result. 
* Overflow (OVF) 
An overflow exception occurs when the magnitude of the result before rounding is bigger than the largest 
positive representable number, Smax. If the operation in slice k produces an overflow, the OVF flag for 


slice k in the Floating-Point Status and Control Register (FPSCR) is set, and the result is saturated to 
Smax with the appropriate sign. 


* Underflow (UNF) 
An underflow exception occurs when the magnitude of the result before rounding is smaller than the 


smallest positive representable number, Smin. If the operation in slice k produces an underflow, the UNF 
flag for slice k in the FPSCR is set, and the result is saturated to a positive zero. 


* Divide-by-Zero (DBZ) 
A divide-by-zero exception occurs when the input of an estimate instruction has a zero exponent. If the 
operation in slice k produces a divide-by-zero exception, the DBZ flag for slice k in the FPSCR is set. 

* |EEE noncompliant result (DIFF) 
A different-from-IEEE exception indicates that the result produced with extended-range arithmetic could 
be different from the IEEE result. This occurs when one of the following conditions exists: 


— Any of the inputs or the result has a maximal exponent (IEEE arithmetic treats such an operand as 
NaN or Infinity; extended-range arithmetic treats them as normalized values.) 


— Any of the inputs has a zero exponent and a nonzero fraction (IEEE arithmetic treats such an oper- 
and as a denormal number; extended-range arithmetic treats them as a zero.) 


— An underflow occurs; that is, the result before rounding is different from zero and the result after 
rounding is zero. 


If this happens for the operation in slice k, the DIFF flag for slice k in the FPSCR is set. 


These exceptions can be set only by extended-range floating-point instructions. Table 9-2 lists the instruc- 
tions for which exceptions can be set. 


Table 9-2. Instructions and Exception Settings 









































Instruction Set OVF Set UNF Set DBZ Set DIFF 
fa, fs, fm, fma, fms, fnms, fi Yes Yes No Yes 
frest, frsqest No No Yes No 
csflt, cufit Yes Yes No Yes 
cflts, cfltu, fceq, fcneq, fcgt, fcmgt No No No No 
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SPU double-precision instructions process 128-bit values as two SIMD double-precision operations. SIMD 
slice O processes doubleword 0, and slice 1 processes doubleword 1. For double-precision operations, 
normal IEEE semantics and definitions apply. The range of the nonzero numbers supported by this format is 
between the minimum and the maximum listed in Table 9-3. Table 9-3 also demonstrates converting from a 
register value to a decimal value. 


Table 9-3. Double-Precision (IEEE Mode) Minimum and Maximum Values 
























































Minimum Positive Maximum Positive 
NumberFormat Denormalized Magnitude (Dmin) Normalized Magnitude (Dmax) Noros 
Register Value 0x0000000000000001 Ox7FEFFFFFFFFFFFFF 
Fraction (implied Fraction (implied 
Sign 11-Bit Biased [0] and 52 bits Sian 11-Bit Biased [1] and 52 bits 1 
Bit Fields 9 Exponent for denormal- 9 Exponent for normalized 
ized number) number) 
0 00000000000 [0.]000...001 0 11111111110 [1.]111...111 2 
Value in Powers of 2 + 2(0 + 1 - 1023) 29? + 2(2046 - 1023) 2.29? 3,4 
Combined Exponent and Fraction 71022 « (5-52) 21023 « (419 - 2-52] 
Value of Register in Decimal 4.9 *10:324 1.8 * 10308 
Notes: 
1. The exponent is biased by +1023. 
2. An exponent field of all ones is reserved for not-a-number (NaN) and infinity. 
3. The value 2 - 25? is one LSb less than 2. 
4. An extra 1 is added to the exponent for denormalized numbers. 





Double-precision operations in the SPU have the following characteristics: 
* Only a subset of the operations required by the IEEE standard is supported in hardware. 
* All four rounding modes are supported. 


* The rounding modes for the two slices can be controlled independently. The RNO field (bits 20 - 21) in the 
FPSCR specifies the current rounding mode for slice 0; the RN1 field (bits 22 -23) in the FPSCR specifies 
the current rounding modes for slice 1. 


* The IEEE exceptions are detected and accumulated in the FPSCR. Trapping is not supported. 


* The IEEE standard recognizes two kind of NaNs. These are values that have the maximum biased expo- 
nent value and a nonzero fraction value. The sign bit is ignored. If the high-order bit of the fraction field is 
ObO, then the NaN is a Signaling NaN (SNaN); otherwise, it is a Quiet NaN (QNaN). When a QNaN is the 
result of a floating-point operation that has no NaN inputs, the result is always the default QNaN. That is, 
the high-order bit of the fraction field is Ob1, all the other bits of the fraction field are zero, and the sign bit 
is zero. 


* The IEEE standard has very strict rules on the propagation of NaNs. When a QNaN is the result of a float- 
ing-point operation that has at least one NaN input, an SPU implementation can either produce the 
default QNaN or one of the input NaN values. If an implementation produces a QNaN result rather than 
propagating the proper input NaN, QNaN, or SNaN; the NaN flag in the FPSCR is set to signal a possibly 
noncompliant result. 
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* Some implementations might support denorms only as results. Such an implementation treats denormal 
operands as zeros (this also applies to the setting of the IEEE flags); the sign of the operand is pre- 
served. Whenever a denormal operand is forced to zero, the DENORM flag in the FPSCR is set to signal 
a possibly noncompliant result. 


9.2.1 Conversions Between Single-Precision and Double-Precision Format 


There are two types of conversions: one rounds a double-precision number to a single-precision number 
(frds); the other extends a single-precision number to a double-precision number (fesd). Both operations 
comply with the IEEE standard, except for the handling of denormal inputs. Some implementations may force 
denormal values to zero. When an implementation forces a denormal input to zero, it sets the DENORM flag 
rather than the Underflow flag in the FPSCR. Thus, for these two operations, NaNs, infinities, and denormal 
results are supported in double-precision format as well as in single-precision format. The range of nonzero 
IEEE single-precision numbers supported is between the minimum and the maximum listed in Table 9-4. 
Table 9-4 also demonstrates converting from a register value to a decimal value. 


Table 9-4. Single-Precision (IEEE Mode) Minimum and Maximum Values 















































Momper parmal P a (Smin) e ede Notes 
Register Value 0x00000001 Ox7F7FFFFF 
Bt Flis uic Mec Son | Syren! aaa 

0 00000000 [0.]000..001 0 11111110 [1.2 11...111 

Value in Powers of 2 +  |2(0+1-127) 223 + 2(254-127) 2-223 2 
Combined Exponent and Fraction 27126 « 5-23 2127 * (p - 2723) 
Value of Register in Decimal 14 * 1075 3.4 * 1038 
Notes: 


1. The exponent field is biased by +127. 
2. The value 2 - 2?? is 1 LSb less than 2. 











9.2.2 Exception Conditions 


This architecture only supports nontrap exception handling; that is, exception conditions are detected and 
reported in the appropriate fields of the FPSCR. These flags are sticky; once set, they remain set until they 
are cleared by an FPSCR-write instruction. These exception flags are not set by the single-precision opera- 
tions executed in the extended range. Because the double-precision operations are 2-way SIMD, there are 
two sets of these flags. 


Inexact Result (INX) 
An inexact result is detected when the delivered result value differs from what would have been computed if 
both the exponent range and precision were unbounded. 


Overflow (OVF) 
An overflow occurs when the magnitude of what would have been the rounded result if the exponent range 
were unbounded exceeds that of the largest finite number of the specified result precision. 
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Underflow (UNF) 
For nontrap exception handling, the IEEE 754 standard defines the underflow (UNF) as the following: 


UNF = tiny AND loss of accuracy 


Where there are two definitions each for tiny and loss of accuracy, and the implementation is free to 
choose any of the four combinations. This architecture implements tiny-before-rounding and inexact 
result (INX), thus: 


UNF = tiny before rounding AND inexact result 


Note: Tiny before rounding is detected when a nonzero result value, computed as though the exponent 
range were unbounded, would be less in magnitude than the smallest normalized number. 


Invalid Operation (INV) 
An invalid operation exception occurs whenever an operand is invalid for the specified operation. For opera- 
tions implemented in hardware, the following operations give rise to an invalid operation exception condition: 


* Any floating-point operation on a signaling NaN (SNaN) 


* For add, subtract, and fused multiply add operations on magnitude subtraction of infinities; that is, 
infinity - infinity 
* Multiplication of infinity by zero. 


Note: Some implementations may treat denormal inputs as zeros and set both the DENORM flag and 
the Invalid Operation flag. 


Not Propagated NaN (NaN) 

The IEEE standard requires special handling of input NaNs, but SPU implementations can deliver the default 
QNaN as a result of double-precision operations. When at least one of the inputs is a NaN, the resulting 
QNAN can differ from the result delivered by a design that is fully compliant with the IEEE standard. This is 
flagged in the NaN field. 


Denormal Input Forced to Zero (DENORM) 

SPU implementations can force certain double-precision denormal operands to zeros before the processing 
of double-precision operations. If an implementation forces these operands to zeros, the zero will preserve 
the sign of the original denormal value. When a denormal input is forced to zero, the DENORM exception flag 
is set in the FPSCR to signal that the result could differ from an IEEE-compliant result. 


Programming Note: Applications that require IEEE-compliant double-precision results can use the NaN and 
DENORM flags in the FPSCR to detect noncompliant results. This allows the code to be re-executed in a less 
efficient but compliant manner. Both flags are sticky, so that large blocks of code can be guarded, minimizing 
the overhead of the code checking. For example, 


clear fpscr 
fast code block 
if (NaN| |DENORM) 
{ 
compliant code block 


} 
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On SPUs within CBEA-compliant processors, the SPU can stop and signal the PPE to request that the PPE 
perform the calculation and then restart the SPU. 


Table 9-5 lists the instructions for which exceptions can be set. 


Table 9-5. Instructions and Exception Settings 

















Instruction Set OVF Set UNF Set INX Set INV Set NAN |Set DENORM 
dfa, dfs, dfm, dfma, dfms, dfnms, dfnma Yes Yes Yes Yes Yes Yes 
fesd No No No Yes Yes Yes 
frds Yes Yes Yes Yes Yes Yes 


























9.3 Floating-Point Status and Control Register 


The Floating-Point Status and Control Register (FPSCR) records the status resulting from the floating-point 
operations and controls the rounding mode for double-precision operations. The FPSCR is read by the 
FPSCR read instruction (fscrrd) and written with the FPSCR write instruction (fscrwr). Bits [20:23] are 
control bits; the remaining bits are either status bits or unused. All the status bits in the FPSCR are sticky. 
That is, once set, the sticky bits remain set until they are cleared by an fscrwr instruction. 


The format of the FPSCR is as follows. 











Bits Description 
0:19 |Unused 
20:21 | Rounding control for slice 0 of the 2-way SIMD double-precision operations (RNO) 
00 Round to nearest even 
01 Round towards zero (truncate) 
10 Round towards +infinity 
11 Round towards -infinity 


22:23 Rounding control for slice 1 of the 2-way SIMD double-precision operations (RN1) 


00 Round to nearest even 
01 Round towards zero (truncate) 
10 Round towards +infinity 
11 Round towards -infinity 





24:28 Unused 





29:31 Single-precision exception flags for slice O 

29 Overflow (OVF) 

30 Underflow (UNF) 

31 Result produced with extended-range arithmetic could be different from the IEEE compliant result (DIFF) 





32:49 Unused 





50:55 | IEEE exception flags for slice 0 of the 2-way SIMD double-precision operations 
50 Overflow (OVF) 
51 Underflow (UNF) 


52 Inexact result (INX) 

53 Invalid operation (INV) 

54 Possibly noncompliant result because of QNaN propagation (NaN) 

55 Possibly noncompliant result because of denormal operand (DENORM) 





56:60 Unused 
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Bits Description 
61:63 |Single-precision exception flags for slice 1 (OVF, UNF, DIFF) 
64:81 Unused 
82:87 IEEE exception flags for slice 1 of the 2-way SIMD double-precision operations (OVF, UNF, INX, INV, NAN, DENORM) 
88:92 Unused 
93:95  Single-precision exception flags for slice 2 (OVF, UNF, DIFF) 
96:115 Unused 
116:119 Single-precision divide-by-zero flags for each of the four slices 
116 DBZ for slice 0 
117 DBZ for slice 1 
118 DBZ for slice 2 
119 DBZ for slice 3 
120:124 Unused 
125:127 Single-precision exception flags for slice 3 (OVF, UNF, DIFF) 
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Floating Add Required v 1.0 
fa rt,ra,rb 
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For each of the four word slots: 
* The operand from register RA is added to the operand from register RB. 
* The result is placed in register RT. 


e |f the magnitude of the result is greater than Smax, then Smax (with the correct sign) is produced as the 
result. If the magnitude of the result is less than Smin, then zero is produced. 
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Required v 1.0 

















dfa rt,ra,rb 
0 1 0 1 1 0 0 1 1 O0 O RB RA RT 
a ee ee ee I Vy Vs j 
0 12 3 4 5 6 7 8 9 10/11 12 13 14 15 16 17|18 19 20 21 22 23 24|25 26 27 28 29 30 31 











For each of two doubleword slots: 


* The operand from register RA is added to the operand from register RB. 


* The result is placed in register RT. 
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Floating Subtract Required v 1.0 
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For each of the four word slots: 
* The operand from register RB is subtracted from the operand from register RA. 
* The result is placed in register RT. 


* |f the magnitude of the result is greater than Smax, then Smax (with the correct sign) is produced as the 
result. If the magnitude of the result is less than Smin, then zero is produced. 
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Double Floating Subtract Required v 1.0 

















dfs rt,ra,rb 
0 1 0 1 1 0 0 1:! 1 O 1 RB RA RT 
T ye yy Ma ae Vy Vs j 
012 3 4 5 6 7 8 9 10/11 12 13 14 15 16 17/18 19 20 21 22 23 24|25 26 27 28 29 30 31 











For each of two doubleword slots: 
* The operand from register RB is subtracted from the operand from register RA. 


* The result is placed in register RT. 
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Floating Multiply Required v 1.0 
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For each of the four word slots: 
* The operand from register RA is multiplied by the operand from register RB. 
* The result is placed in register RT. 


* |f the magnitude of the result is greater than Smax, then Smax (with the correct sign) is produced. If the 
magnitude of the result is less than Smin, then zero is produced. 
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Double Floating Multiply Required v 1.0 
dfm rt,ra,rb 

0 1 O 1 1 0 O 1 1 1 O RB RA RT 

oe żżvyżłżvył Vy Vx 4 
O 1 2 3 4 5 6 7 8 9 10/11 12 13 14 15 16 17/18 19 20 21 22 23 24|25 26 27 28 29 30 31 




















For each of two doubleword slots: 
* The operand from register RA is multiplied by the operand from register RB. 
* The result is placed in register RT. 
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Floating Multiply and Add Required v 1.0 
fma rt,ra,rb,rc 

1 1 1 0 RT RB RA RC 
FERE Vs t4 vv ; 
O 1 2 3.4 5 6 7 8 9 10/11 12 13 14 15 16 17/18 19 20 21 22 23 24,25 26 27 28 29 30 31 























For each of the four word slots: 


* The operand from register RA is multiplied by the operand from register RB and added to the operand 
from register RC. The multiplication is exact and not subject to limits on its range. 


* The result is placed in register RT. 


e |f the magnitude of the result of the addition is greater than Smax, then Smax (with the correct sign) is 
produced. If the magnitude of the result is less than Smin, then zero is produced. 
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dfma rt,ra,rb 
O0 1 1 O0 1! O 1 1 1 ÁO 0 RB RA RT 
Kov 3p xc pe owe ook cy Wo Y yy yy 4 
01 2 3 4 5 6 7 8 9 10/11 12 13 14 15 16 17/18 19 20 21 22 23 24/25 26 27 28 29 30 31 











For each of two doubleword slots: 


* The operand from register RA is multiplied by the operand from register RB and added to the operand 
from register RT. The multiplication is exact and not subject to limits on its range. 


* The result is placed in register RT. 
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Floating Negative Multiply and Subtract Required v 1.0 
fnms rt,ra,rb,rc 

1 1 0 1 RT RB RA RC 

voy y vov vy vy yy V 
0 1 2 3.4 5 6 7 8 9 10/11 12 13 14 15 16 17/18 19 20 21 22 23 24,25 26 27 28 29 30 31 
































For each of the four word slots: 


* The operand from register RA is multiplied by the operand from register RB, and the product is subtracted 
from the operand from register RC. The result of the multiplication is exact and not subject to limits on its 
range. 

* The result is placed in register RT. 


e |f the magnitude of the result of the subtraction is greater than Smax, then Smax (with the correct sign) is 
produced. If the magnitude of the result of the subtraction is less than Smin, then zero is produced. 
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Double Floating Negative Multiply and Subtract Required v 1.0 
dfnms rt,ra,rb 


RB RA RT 
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For each of two doubleword slots: 


* The operand from register RA is multiplied by the operand from register RB. The operand from 
register RT is subtracted from the product. The result, which is placed in register RT, is usually obtained 
by negating the rounded result of this multiply subtract operation. There is one exception: If the result is a 
QNalN, the sign bit of the result is zero. 


* This instruction produces the same result as would be obtained by using the Double Floating Multiply and 
Subtract instruction and then negates any result that is not a NaN. 


* The multiplication is exact and not subject to limits on its range. 
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Floating Multiply and Subtract Required v 1.0 
fms rt,ra,rb,rc 

1 1 1 1 RT RB RA RC 
TERE Vs t4 vv ; 
0 1 2 3.4 5 6 7 8 9 10/11 12 13 14 15 16 17/18 19 20 21 22 23 24,25 26 27 28 29 30 31 























For each of the four word slots: 


* The operand from register RA is multiplied by the operand from register RB. The result of the multiplica- 
tion is exact and not subject to limits on its range. The operand from register RC is subtracted from the 
product. 


* The result is placed in register RT. 


e |f the magnitude of the result of the subtraction is greater than Smax, then Smax (with the correct sign) is 
produced. If the magnitude of the result of the subtraction is less than Smin, then zero is produced. 
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dfms rt,ra,rb 
O0 1 1 0 1:1 O 14 1 41 O 1 RB RA RT 
oe ee ee ee yy yy 4 
0.12 3 4 5 6 7 8 9 10/11 12 13 14 15 16 17/18 19 20 21 22 23 24/25 26 27 28 29 30 31 











For each of two doubleword slots: 


* The operand from register RA is multiplied by the operand from register RB. The multiplication is exact 
and not subject to limits on its range. The operand from register RT is subtracted from the product. 


* The result is placed in register RT. 
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Double Floating Negative Multiply and Add Required v 1.0 
dfnma rt,ra,rb 

O 1 1 0O 1 0 1 4 1 1 1 RB RA RT 

oe eo 1s v Vy um + 
012 3 4 5 6 7 8 9 10/11 12 13 14 15 16 17/18 19 20 21 22 23 24,25 26 27 28 29 30 31 




















For each of two doubleword slots: 


* The operand from register RA is multiplied by the operand from register RB and added to the operand 
from register RT. The multiplication is exact and not subject to limits on its range. The result, which is 
placed in register RT, is usually obtained by negating the rounded result of this multiply add operation. 
There is one exception: If the result is a QNaN, the sign bit of the result is 0. 


e This instruction produces the same result as would be obtained by using the Double Floating Multiply and 
Add instruction and then negating any result that is not a NaN. 
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Floating Reciprocal Estimate Required v 1.0 
frest rt,ra 

0.0 1 1 01 1 1 0 0 0 Ill RA RT 

$4 i11 v Vy vy v 
012 3 4 5 6 7 8 9 10/11 12 13 14 15 16 17/18 19 20 21 22 23 24|25 26 27 28 29 30 31 




















For each of four word slots: 


* The operand in register RA is used to compute a base and a step for estimating the reciprocal of the 
operand. The result, in the form shown below, is placed in register RT. S is the sign bit of the base result. 


S Biased Exponent BaseFraction StepFraction 


| vy vy 4 
0/1 2 3 4 5 6 7 8,9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
































* The base result is expressed as a floating-point number with 13 bits in the fraction, rather than the usual 
23 bits. The remaining 10 bits of the fraction are used to encode the magnitude of the step as a 10-bit 
denormal fraction; the exponent is that of the base. 


e The step fraction differs from the base fraction (and any normalized IEEE fraction) in that there is a ‘0’ in 
front of the binary point and three additional bits of ‘0’ between the binary point and the fraction. The rep- 
resented numbers are as follows: 





Base S 1. BaseFraction * 2BiasedExponent - 127 














Step 0.000 StepFraction * 2BiasedExponent - 127 





* Let x be the initial value in register RA. The result placed in RT, which is interpreted as a regular IEEE 
number, provides an estimate of the reciprocal of a nonzero x. 


* |f the operand in register RA has a zero exponent, a divide-by-zero exception is flagged. 


Programming Note: The result returned by this instruction is intended as an operand for the Floating Inter- 
polate instruction. 


The quality of the estimate produced by the Floating Reciprocal Estimate instruction is sufficient to produce a 
result within 1 ulp of the IEEE single-precision reciprocal after interpolation and a single step of Newton- 
Raphson. Consider this code sequence: 


FREST y0,x // table-lookup 

FI yl,x,yO // interpolation 

FNMS t1,x,y1,0NE // tl = -(x * yl - 1.0) 
FMA  yZtlylyl = // y2 = tl * yl + yl 


Three ranges of input must be described separately: 
Zeros 1/0 is defined to give the maximum SPU single-precision extended-range floating point (sfp) 


number: E 
y2 = x'*7FFF FFFF? (1.999 * 2128) 
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Big If |x| > 212° then 1/x underflows to zero, y2 = 0. 


Note: This underflows for one value of x that IEEE single-precision reciprocal would not. If 
this is a concern, the following code sequence produces the IEEE answer: 


maxnounderflow = 0x7e800000 
min = 0x00800000 

msb = 0x80000000 

FCMEQ selmask,x,maxnounderf low 
AND s1,x,msb 

OR smin,sl,min 

SELB y3,selmask,y2,smin 


Normal 1/x = Y where x * Y < 1.0 andx * INC(Y) > 1.0. 
INC(y) gives the sfp number with the same sign as y and next larger magnitude. 
The absolute error bound is: 
|Y-y2|s1ulp (either y2 = Y, or INC(y2) = Y) 
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Floating Reciprocal Absolute Square Root Estimate Required v 1.0 
frsqest rt,ra 


Ill RA RT 


O 1 
1 iv y v af 
9 10/11 12 13 14 15 16 17/18 19 20 21 22 23 24,25 26 27 28 29 30 31 
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For each of four word slots: 


* The operand in register RA is used to compute a base and step for estimating the reciprocal of the square 
root of the absolute value of the operand. The result is placed in register RT. The sign bit (S) will be zero. 


Biased Exponent BaseFraction StepFraction 


S 
| vy ty 4 
0/1 2 3 4 5 6 7 8,9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 
































* Let x be the initial value of register RA. The result placed in register RT, interpreted as a regular IEEE 
number, provides an estimate of the reciprocal square root of abs(x). 


* |f the operand in register RA has a zero exponent, a divide-by-zero exception is flagged. 


Programming Note: The result returned by this instruction is intended as an operand for the Floating 
Interpolate instruction. 


The quality of the estimate produced by the Floating Reciprocal Absolute Square Root Estimate instruction is 
sufficient to produce an IEEE single-precision reciprocal after interpolation and a single step of Newton- 
Raphson. Consider the following code sequence: 


mask-0x7fffffff 


half=0.5 

one=1.0 

FRSQEST y0,x // table-lookup 
AND ax,x,mask // ax = ABS(x) 
FI yl,ax,y0 // interpolation 
FM tl,ax,yl // tl = ax * yl 
FM t2,yl,HALF // t2 = yl * 0.5 


FNMS t1,tl,yl,ONE // tl = -(tl * yl - 1.0) 
FMA y2,tl,t2,yl // y2 = tl * t2 + yl 


Three ranges of input must be described separately: 


Zeros, where: x fraction < 0x000ff53c then y2 = Ox7fffffff (1.999 * 2128) 
Zeros where: x fraction > 0x000ff53c, y2 » 0x7fc00000 
The following sequence could be used to correct the answer: 


zero = 0.0 

mask = Ox7fffffff 
FCMEQ z,x,zero 
AND zmask,z,mask 
OR y3,zmask,y2 
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Normal 1/sqrt(x) = Y where x * Y? « 1.0 and x * INC(Y)? » 1.0 
INC(y) gives the sfp number with the same sign as y and next larger magnitude. 
The absolute error bound is: 
| y-y2|< 1 ulp (0 and +1 are all possible) 
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Floating Interpolate Required v 1.0 
fi rt,ra,rb 

O 1 1 1 1 0 1 0 1 0 O RB RA RT 

oe 11v Vy vy v 
O 1 2 3 4 5 6 7 8 9 10/11 12 13 14 15 16 17/18 19 20 21 22 23 24|25 26 27 28 29 30 31 




















For each of four word slots: 


* The operand in register RB is disassembled to produce a floating-point base and step according to the 
format described in Floating Reciprocal Estimate on page 215; that is, a sign, biased exponent, base 
fraction, and step fraction. 

* Bits 13 to 31 of register RA are taken to represent a fraction, Y, whose binary point is to the left of bit 13; 
that is, Y «— 0.RA45.31 . 


The result is computed by the following equation: 
RT © (-1)§ * (1.BaseFraction - 0.000StepFraction * Y) * 2(BiasedExponent -127) 


Programming Note: If the operand in register RB is the result of an frest or frsqest instruction with the oper- 
and from register RA, then the result of the fi instruction placed in register RT provides a more accurate esti- 
mation. 
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Convert Signed Integer to Floating Required v 1.0 
csflt rt,ra,scale 

O 1 1 1 O 1 1 0 1 +0 l8 RA RT 

$$ 1i 4s um V4 y 
0 1 2 3 4 5 6 7 8 9.10 11 12 13 14 15 16 17/18 19 20 21 22 23 24|25 26 27 28 29 30 31 




















For each of four word slots: 


* The signed 32-bit integer value in register RA is converted to an extended-range, single-precision, float- 
ing-point value. 


* The result is divided by 2398? and placed in register RT. The factor scale is an 8-bit unsigned integer pro- 
vided by 155 minus the unsigned value from the 18 field. If the value scale is not in the range of 0 to 127, 
the result of the operation is undefined. 


* The scale factor describes the number of bit positions between the binary point of the magnitude and the 
right end of register RA. A scale factor of zero means that the register RA value is an unscaled integer. 
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Convert Floating to Signed Integer Required v 1.0 
cfits rt,ra,scale 

Oo 1 1 1 O 1 1 0 0 0 18 RA RT 
MEREREEEESM. um V4 
012 3 4 5 6 7 8 9.10 11 12 18 14 15 16 17/18 19 20 21 22 23 24/25 26 27 28 29 30 31 




















For each of four word slots: 


* The extended-range, single-precision, floating-point value in register RA is multiplied by 2scale The factor 
scale is an 8-bit unsigned integer provided by 173 minus the unsigned value from the 18 field. If the value 
scale is not in the range of 0 to 127, the result of the operation is undefined. 


* The product is converted to a signed 32-bit integer. If the intermediate result is greater than (2?! - 1), it 
saturates to (2?! - 1); if it is less than -2?!, it saturates to -2?!. The resulting signed integer is placed in 
register RT. 


* The scale factor is the location of the binary point of the result, expressed as the number of bit positions 
from the right end of the register RT. A scale factor of zero means that the value in register RT is an 
unscaled integer. 
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Convert Unsigned Integer to Floating Required v 1.0 








cuflt rt,ra,scale 

O0 1 1 1 O0 1 1 01 1 18 RA RT 

Vox Wede yc ce oe xe ov Vy Vs j 
0 1 2 3 4 5 6 7 8 9/10 11 12 13 14 15 16 17/18 19 20 21 22 23 24/25 26 27 28 29 30 31 




















For each of four word slots: 


* The unsigned 32-bit integer value in register RA is converted to an extended-range, single-precision, 
floating-point value. 


* The result is divided by 2398? and placed in register RT. The factor scale is an 8-bit unsigned integer pro- 
vided by 155 minus the unsigned value from the 18 field. If the value scale is not in the range of 0 to 127, 
the result of the operation is undefined. 


* The scale factor describes the number of bit positions between the binary point of the magnitude and the 
right end of register RA. A scale factor of zero means that the register RA value is an unscaled integer. 
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Convert Floating to Unsigned Integer Required v 1.0 
cfltu rt,ra,scale 

O 1 1 1 O 1 1 0 0 1 l8 RA RT 

oe èżtżłżłł4 9 vy Vu 
012 3.4 5 6 7 8 9.10 11 12 13 14 15 16 17/18 19 20 21 22 23 24,25 26 27 28 29 30 31 




















For each of four word slots: 


* The extended-range, single-precision, floating-point value in register RA is multiplied by 23c8lé The factor 
scale is an 8-bit unsigned integer provided by 173 minus the unsigned value from the 18 field. If the value 
scale is not in the range of 0 to 127, the result of the operation is undefined. 


* The product is converted to an unsigned 32-bit integer. If the intermediate result is greater than (2?? - 1) it 
saturates to (23? - 1). If the product is negative, it saturates to zero. The resulting unsigned integer is 
placed in register RT. 


* The scale factor is the location of the binary point of the result, expressed as the number of bit positions 
from the right end of the register RT. A scale factor of zero means that the value in RT is an unscaled inte- 
ger. 


Version 1.2 Floating-Point Instructions 
January 27, 2007 Page 223 of 278 


SONY 


Instruction Set Architecture €» SONY 


M 


Synergistic Processor Unit 








Floating Round Double to Single Required v 1.0 
frds rt,ra 

O 1 1 1 0 1 1 1 0 0 1 Ill RA RT 

$3 i4 1v vy vy . 
O 1 2 3.4 5 6 7 8 9 10/11 12 13 14 15 16 17/18 19 20 21 22 23 24,25 26 27 28 29 30 31 




















For each of two doubleword slots: 


* The double-precision value in register RA is rounded to a single-precision, floating-point value and placed 
in the left word slot. The conversions are done as described in Section 9.2.1 Conversions Between Sin- 
gle-Precision and Double-Precision Format on page 198. Zeros are placed in the right word slot. 


* The rounding is performed in accordance with the rounding mode specified in the Floating-Point Status 
Register. Double-precision exceptions are detected and accumulated in the Floating-Point Unit (FPU) 
Status Register. 
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Floating Extend Single to Double Required v 1.0 


fesd rt,ra 


Hl RA RT 
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For each of two doubleword slots: 


* The single-precision value in the left slot of register RA is converted to a double-precision, floating-point 
value and placed in register RT. The conversions are done as described in Section 9.2.1 Conversions 


Between Single-Precision and Double-Precision Format on page 198. The contents of the right word slot 
are ignored. 


* Double-precision exceptions are detected and accumulated in the FPU Status Register. 
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Double Floating Compare Equal Optional v 1.2 
dfceq rt,ra,rb 

O 1 1 1 1 0 0 0 O 1 1 RB RA RT 

Voy $ Y Vy y Y Y Y y Y ty vy . 
0 1 2 3 4 5 6 7 8 9 10|11 12 18 14 15 16 17 18 19 20 21 22 23 24/25 26 27 28 29 30 31 




















For each of the two doubleword slots: 


* The double-precision floating-point value from register RA is compared with the double-precision floating- 
point value from register RB. If the values are equal, a result of all ones (true) is produced in register RT. 
Otherwise, a result of zero (false) is produced in register RT. 


* Two zeros always compare equal independent of their signs. 


* A NaN compares false against all other operands. Even two NaNs with identical bit patterns generate 
false. 


* When accessing a NaN, the corresponding INV exception bit in the FPSCR is set. 
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Double Floating Compare Magnitude Equal Optional v 1.2 
dfcmeq rt,ra,rb 

O 1 1 1 1 0 O 1 O 1 1 RB RA RT 
$4111 v Vy vy . 
0 1 2 3 4 5 6 7 8 9 10/11 12 18 14 15 16 17 18 19 20 21 22 23 24/25 26 27 28 29 30 31 




















For each of the two doubleword slots: 


* The absolute value of the double-precision floating-point number in register RA is compared with the 
absolute value of the double-precision floating-point number in register RB. If the absolute values are 
equal, a result of all ones (true) is produced in register RT. Otherwise, a result of zero (false) is produced 
in register RT. 


* Two zeros always compare equal independent of their signs. 


* A NaN compares false against all other operands. Even two NaNs with identical bit patterns generate 
false. 


* When accessing a NaN, the corresponding INV exception bit in the FPSCR is set. 
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Double Floating Compare Greater Than Optional v 1.2 
dfcgt rt,ra,rb 

0 1 O 1 1.0000 1 1 RB RA RT 

a oe lis v Vy um + 
0 1 2 3 4 5 6 7 8 9 10|11 12 13 14 15 16 17/18 19 20 21 22 23 24/25 26 27 28 29 30 31 




















For each of the two doubleword slots: 


* The double-precision floating-point value in register RA is compared with the double-precision floating- 
point value in register RB. If the value in RA is greater than the value in RB, a result of all ones (true) is 
produced in register RT. Otherwise, a result of zero (false) is produced in register RT. 


* Two zeros never compare greater than, independent of their sign bits. 


* A NaN compares false against all other operands. Even two NaNs with identical bit patterns generate 
false. 


* When accessing a NaN, the corresponding INV exception bit in the FPSCR is set. 
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Double Floating Compare Magnitude Greater Than Optional v 1.2 
dfcmgt rt,ra,rb 
RB RA RT 
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For each of the two doubleword slots: 


* The absolute value of the double-precision floating-point number in register RA is compared with the 
absolute value of the double-precision floating-point number in register RB. If the absolute value of the 
value from register RA is greater than the absolute value of the value from register RB, a result of all ones 
(true) is produced in register RT. Otherwise, a result of zero (false) is produced in register RT. 


* Two zeros never compare greater than, independent of their signs. 


* A NaN compares false against all other operands. Even two NaNs with identical bit patterns generate 
false. 


* When accessing a NaN, the corresponding INV exception bit in the FPSCR is set. 


Version 1.2 Floating-Point Instructions 
January 27, 2007 Page 229 of 278 


SONY 


Instruction Set Architecture €» SONY 


COMPUTER 
SER ARMAS a 


Synergistic Processor Unit 


Double Floating Test Special Value Optional v 1.2 
dftsv rt,ra,value 
I7 RA RT 
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For each of two doubleword slots: 


e The double-precision floating-point value in register RA is tested for special values. The bits of I7 enable 
the following seven checks 
































I7 RA Value Category 
1000000 NaN 
0100000 Infinity 
0010000 -Infinity 
0001000 +0 
0000100 -0 
0000010 Positive Denorm 
0000001 Negative Denorm 








* If one or more of the enabled checks is true, a result of all ones is produced in register RT. When none of 
the enabled checks is met, a result of all zeros is produced in register RT. 
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Floating Compare Equal Required v 1.0 
fceq rt,ra,rb 

O 1 1 1 1 0 00 0 1 0 RB RA RT 

oe żżvyżłżvył“ Vy vy . 
0.12 3 4 5 6 7 8 9 10/11 12 13 14 15 16 17/18 19 20 21 22 23 24,25 26 27 28 29 30 31 




















For each of four word slots: 


* The floating-point value from register RA is compared with the floating-point value from register RB. If the 
values are equal, a result of all ones (true) is produced in register RT. Otherwise, a result of zero (false) is 
produced in register RT. Two zeros always compare equal independent of their fractions and signs. 


* This instruction is always executed in extended-range mode and ignores the setting of the mode bit. 
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Floating Compare Magnitude Equal Required v 1.0 
fcmeq rt,ra,rb 

O 1 1 1 1 0 O 1 0 1 O RB RA RT 
AMNEM EE E E A. vy vy . 
012 3.4 5 6 7 8 9 10/11 12 13 14 15 16 17/18 19 20 21 22 23 24,25 26 27 28 29 30 31 




















For each of four word slots: 


* The absolute value of the floating-point number in register RA is compared with the absolute value of the 
floating-point number in register RB. If the absolute values are equal, a result of all ones (true) is pro- 
duced in register RT. Otherwise, a result of zero (false) is produced in register RT. Two zeros always com- 
pare equal independent of their fractions and signs. 


* This instruction is always executed in extended-range mode and ignores the setting of the mode bit. 
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Floating Compare Greater Than Required v 1.0 
fcgt rt,ra,rb 


RB RA RT 


1 0 
RUE" y v af 
9 10/11 12 13 14 15 16 17/18 19 20 21 22 23 24,25 26 27 28 29 30 31 








co |«— o 


4 


nie o 
w| = 
A. 
a| o 


i 


Oj|«— o 




















For each of four word slots: 


* The floating-point value in register RA is compared with the floating-point value in register RB. If the value 
in RA is greater than the value in RB, a result of all ones (true) is produced in register RT. Otherwise, a 
result of zero (false) is produced in register RT. Two zeros never compare greater than independent of 
their sign bits and fractions. 


* This instruction is always executed in extended-range mode, and ignores the setting of the mode bit. 
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Floating Compare Magnitude Greater Than Required v 1.0 
fcmgt rt,ra,rb 


RB RA RT 
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For each of four word slots: 


e The absolute value of the floating-point number in register RA is compared with the absolute value of the 
floating-point number in register RB. If the absolute value of the value from register RA is greater than the 
absolute value of the value from register RB, a result of all ones (true) is produced in register RT. Other- 
wise, a result of zero (false) is produced in register RT. Two zeros never compare greater than, indepen- 
dent of their fractions and signs. 


* This instruction is always executed in extended-range mode, and ignores the setting of the mode bit. 
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Floating-Point Status and Control Register Write Required v 1.0 
fscrwr ra 

O 1 1 1 O 1 1 1 0 1 O Ill RA RT 

$4 i4 11v vy vy v 
O 1 2 3 4 5 6 7 8 9 10/11 12 13 14 15 16 17/18 19 20 21 22 23 24|25 26 27 28 29 30 31 




















The 128-bit value of register RA is written into the FPSCR. The value of the unused bits in the FPSCR is 
undefined. RT is a false target. Implementations can schedule instructions as though this instruction 
produces a value into RT. Programs can avoid unnecessary delay by programming RT so as not to appear to 
source data for nearby subsequent instructions. False targets are not written. 
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Floating-Point Status and Control Register Read 
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This instruction reads the value of the FPSCR. In the result, the unused bits of the FPSCR are forced to zero. 


The result is placed in the register RT. 
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10. Control Instructions 


This section lists and describes the SPU control instructions. 
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Stop and Signal Required v 1.0 
stop 

00000000000 Hl Stop and Signal Type 

Voy $ Y Vy y Y Y y y Y vy . 
0.1 2 3 4 5 6 7 8 9 10/11 12 13 14 15 16 17.18 19 20 21 22 23 24 25 26 27 28 29 30 31 

















Execution of the program in the SPU stops, and the external environment is signaled. No further instructions 
are executed. 





PC — PC +4&LSLR 
precise stop 
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Required v 1.0 
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Execution of the program in the SPU stops. 





PC — PC +4&LSLR 
precise stop 








Programming Note: This instruction differs from stop only in that, in typical implementations, instructions 
with dependencies can be replaced with stopd to create a breakpoint without affecting the instruction timings. 
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No Operation (Load) 
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This instruction has no effect on the execution of the program. It exists to provide implementation-defined 


control of instruction issuance. 
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No Operation (Execute) Required v 1.0 
nop 

0 1000 0 0 0 0 0 1 Ill IIl RT 

toy Yo* y y » Y vov o* v vy vy y 
O 1 2 3 4 5 6 7 8 9 10|11 12 13 14 15 16 17 18 19 20 21 22 23 24|25 26 27 28 29 30 31 




















This instruction has no effect on the execution of the program. It exists to provide implementation-defined 
control of instruction issuance. RT is a false target. Implementations can schedule instructions as though this 
instruction produces a value into RT. Programs can avoid unnecessary delay by programming RT so as not 
to appear to source data for nearby subsequent instructions. False targets are not written. 
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Synchronize Required v 1.0 








sync 
0 000000001 0€ Il 

Wok We oy c c oy e op i 
012 3 4 5 6 7 8 9 10|11|12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 

















This instruction has no effect on the execution of the program other than to cause the processor to wait until 
all pending store instructions have completed before fetching the next sequential instruction. This instruction 
must be used following a store instruction that modifies the instruction stream. 


The C feature bit causes channel synchronization to occur before instruction synchronization occurs. 
Channel synchronization allows an SPU state modified through channel instructions to affect execution. 
Synchronization is discussed in more detail in Section 13 Synchronization and Ordering on page 253. 
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Synchronize Data Required v 1.0 
dsync 

0 000 0 0 0 0 0 1 1 Mit III Ill 
i111 v Vy um . 
0 1 2 3 4 5 6 7 8 9 10/11 12 18 14 15 16 17 18 19 20 21 22 23 24/25 26 27 28 29 30 31 




















This instruction forces all earlier load, store, and channel instructions to complete before proceeding. No 
subsequent load, store, or channel instructions can start until the previous instructions complete. The dsync 
instruction allows SPU software to ensure that the local storage data would be consistent if it were observed 
by another entity. This instruction does not affect any prefetching of instructions that the processor might 
have done. Synchronization is discussed in more detail in Section 18 Synchronization and Ordering on 

page 253. 
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Move from Special-Purpose Register Required v 1.0 
mfspr rt,sa 

0 00 0 0 0 O 1 1 0 0 Ill SA RT 
AME EAE AA E A Vy um + 
0 1 2 3 4 5 6 7 8 9 10|11 12 18 14 15 16 17/18 19 20 21 22 23 24/25 26 27 28 29 30 31 




















Special-Purpose Register SA is copied into register RT. If SPR SA is not defined, zeros are supplied. 


Note: The SPU ISA defines the mtspr and mfspr instructions as 128-bit operations. An implementation 
might define 32-bit wide registers. In that case, the 32-bit value occupies the preferred slot; the other slots 
return zeros. 





if defined(SPR(SA)) then RT <+ SPR(SA) 
else RT —0 
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Move to Special-Purpose Register Required v 1.0 
mtspr sa, rt 

0 010 0 0 O 1 1.0 0 Ill SA RT 

oe 1s v Vy vy v 
O 1 2 3 4 5 6 7 8 9 10|11 12 13 14 15 16 17/18 19 20 21 22 23 24|25 26 27 28 29 30 31 




















The contents of register RT is written to Special-Purpose Register SA. If SPR SA is not defined, no operation 
is performed. 


Note: The SPU ISA defines the mtspr and mfspr instructions as 128-bit operations. An implementation 
might define 32-bit wide registers. In that case, the 32-bit value of the preferred slot is used; values in the 
other slots are ignored. 


if defined(SPR(SA)) then 
SPR(SA) — RT 


else 
do nothing 
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11. Channel Instructions 


The SPU provides an input/output interface based on message passing called the channel interface. This 
section describes the instructions used to communicate between the SPU and external devices through the 
channel interface. 


Channels are 128-bit wide communication paths between the SPU and external devices. Each channel oper- 
ates in one direction only, and is called either a read channel or a write channel, according to the operation 
that the SPU can perform on the channel. Instructions are provided that allow the SPU program to read from 
or write to a channel; the operations performed must match the type of channel addressed. 


An implementation can implement any number of channels up to 128. Each channel has a channel number in 
the range 0-127. Channel numbers have no particular significance, and there is no relationship between the 
direction of a channel and its number. 


The channels and the external devices have capacity. Channel capacity is the minimum number of reads or 
writes that can be performed without delay. Attempts to access a channel without capacity cause instruction 
processing to cease until capacity becomes available and the access can complete. The SPU maintains 
counters to measure channel capacity and provides an instruction to read channel capacity. 


As long as capacity is available, the channels and external devices can service a burst of SPU accesses 
without requiring the SPU to delay execution. An attempt to write to a channel beyond its capacity causes the 
SPU to hang until the external device empties the channel. An attempt to read from a channel when it is 
empty also causes the SPU to hang until the device inserts data into the channel. 
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Read Channel Required v 1.0 
rdch rt,ca 

0.000000 1 1 0 1 Ill CA RT 

tee tte Y Y Y y Y ty vy . 
0 1 2 3 4 5 6 7 8 9 10/11 12 18 14 15 16 17/18 19 20 21 22 23 24/25 26 27 28 29 30 31 




















The SPU waits for data to become available in channel CA (capacity is available). When data is available to 
the channel, it is moved from the channel and placed into register RT. 


If the channel designated by the CA field is not a valid, readable channel, the SPU will stop on or after the 
rdch instruction. 


Note: The SPU ISA defines the rdch and wrch instructions as 128-bit operations. An implementation might 
define 32-bit wide channels. In that case, the 32-bit value occupies the preferred slot; the other slots return 
zeros. 





if readable(Channel(CA)) then 











RT < Channel(CA) 
else 
Stop after executing zero or more instructions after the rdch. 
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Read Channel Count Required v 1.0 
rchcnt rt,ca 

0000000 1 1 1 1 Ill CA RT 

Voy Y Voy y y Y y v Y Vy yy 4 
0 1 2 3 4 5 6 7 8 9 10|11 12 13 14 15 16 17/18 19 20 21 22 23 24|25 26 27 28 29 30 31 




















The channel capacity of channel CA is placed into the preferred slot of register RT. The channel capacity of 
unimplemented channels is zero. 





RT°3 & Channel Capacity(CA) 
RT*15 — 0 
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Write Channel Required v 1.0 








wrch ca,rt 

0.0 10000 1 1 0 1 Ml CA RT 

voy v9 oy Y vovv. Y x vy 4 
012 3 4 5 6 7 8 9 10/11 12 13 14 15 16 17/18 19 20 21 22 23 24/25 26 27 28 29 30 31 




















The SPU waits for capacity to become available in channel CA before executing the wrch instruction. When 
capacity is available in the channel, the contents of register RT are placed into channel CA. Channel writes 
targeting channels that are not valid writable channels cause the SPU to stop on or after the wrch instruction. 


Note: The SPU ISA defines the rdch and wrch instructions as 128-bit operations. An implementation might 
define 32-bit wide channels. In that case, the 32-bit value of the preferred slot is used; values of the other 
slots are ignored. 





if writable (Channel(CA)) then 
Channel(CA) < RT 
else 
Stop after executing zero or more instructions after the wrch. 
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12. SPU Interrupt Facility 


This section describes the SPU interrupt facility. 


External conditions are monitored and managed through external facilities that are controlled through the 
channel interface. External conditions can affect SPU instruction sequencing through the following facilities: 


* The bisled instruction 


The bisled instruction tests for the existence of an external condition and branches to a target if it is 
present. The bisled instruction allows the SPU software to poll for external conditions and to call a han- 
dler subroutine, if one is present. When polling is not required, the SPU can be enabled to interrupt nor- 
mal instruction processing and to vector to a handler subroutine when an external condition appears. 


* The interrupt facility 


The following indirect branch instructions allow software to enable and disable the interrupt facility during 
critical subroutines: 


* bi 

* bisl 

* bisled 
* biz 

* binz 
* bihz 
* bihnz 


All of these branch instructions provide the [D] and [E] feature bits. When one of these branches is taken, the 
interrupt-enable status changes before the target instruction is executed. Table 12-1 describes the feature bit 
settings and their results. 


Table 12-1. Feature Bits [D] and [E] Settings and Results 

















Feature Bit Setting 
Result 
[D] [E] 
0 0 Status does not change. 
0 1 Interrupt processing is enabled. 
1 0 Interrupt processing is disabled. 
1 1 Causes undefined behavior. 














12.1 SPU Interrupt Handler 


The SPU supports a single interrupt handler. The entry point for this handler is address O in local storage. 
When a condition is present and interrupts are enabled, the SPU branches to address 0 and disables the 
interrupt facility. The address of the next instruction to be executed is saved in the SRRO register. The iret 
instruction can be used to return from the handler. iret branches indirectly to the address held in the SRRO 
register. iret, like the other indirect branches, has an [E] feature bit that can be used to re-enable interrupts. 
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12.2 SPU Interrupt Facility Channels 


The interrupt facility uses several channels for configuration, state observation, and state restoration. The 
current value of SRRO can be read from the SPU_RdSRRO channel, and the SPU_WrSRRO channel 
provides write access to SRRO. When SRRO is written by wrch 14, synchronization is required to ensure that 
this new value is available to the iret instruction. This synchronization is provided by executing the sync 
instruction with the [C], or Channel Sync, feature bit set. Without this synchronization, iret instructions 
executed after wrch 14 instructions branch to unpredictable addresses. The SPU_RdSRRO and 

SPU WrSRRO support nested interrupts by allowing software to save and restore SRRO to a save area in 
local storage. 


SPU Interrupt Facility Version 1.2 
Page 252 of 278 January 27, 2007 


SONY 


SONY €» Instruction Set Architecture 


SAMMI © 


Synergistic Processor Unit 


13. Synchronization and Ordering 


The SPU provides a sequentially ordered programming model so that, with a few exceptions, all previous 
instructions appear to be finished before the next instruction is started. 


Systems including an SPU often feature external devices with direct local storage access. Figure 13-1 shows 
a common organization where the external devices also communicate with the SPU via the channel interface. 
These systems are shared memory multiprocessors with message passing. 


Figure 13-1. Systems with Multiple Accesses to Local Storage 
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Table 13-1 defines five transactions serviced by local storage. The SPU ISA does not define the behavior of 
the external device or how the external device accesses local storage. When this document refers to an 
external write of local storage, it assumes the external device delivers data to local storage such that a subse- 
quent SPU load from local storage can retrieve the data. 


Table 13-1. Local Storage Accesses 























Name Description 
Load SPU load instruction gets data from local storage read. 
Store SPU store instruction sends data to local storage write. 
Fetch SPU instruction fetch gets data from local storage read. 
ExtWrite External device sends data to local storage write. 
ExtRead External device gets data from local storage read. 














Interaction between local storage accesses of the external devices and those of the SPU can expose effects 
of SPU implementation-specific reordering, speculation, buffering, and caching. This section discusses how 
to order sequences of these transactions to obtain consistent results. 
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13.1 Speculation, Reordering, and Caching SPU Local Storage Access 


SPU local storage access is weakly consistent (see PowerPC Virtual Environment Architecture, Book II). 
Therefore, the sequential execution model, as applied to instructions that cause storage accesses, guaran- 
tees only that those accesses appear to be performed in program order with respect to the SPU executing the 
instructions. These accesses might not appear to be performed in program order with respect to external 
local storage accesses or with respect to the SPU instruction fetch. This means that, in the absence of 
external local storage writes, an SPU load from any particular address returns the data written by the most 
recent SPU store to that address. However, an instruction fetch from that address does not necessarily return 
that data. 


The SPU is allowed to cache, buffer, and otherwise reorder its local storage accesses. SPU loads, stores, 
and instruction fetches might or might not access the local storage. The SPU can speculatively read the local 
storage. That is, the SPU can read the local storage on behalf of instructions that are not required by the 
program. The SPU does not speculatively write local storage. If and when the SPU stores access local 
storage, the SPU only writes local storage on behalf of stores required by the program. Instruction fetches, 
loads, and stores can access local storage in any order. 


13.2 SPU Internal Execution State 


The channel interface can be used to modify the SPU internal execution state. An internal execution state is 
any state within an SPU, but outside local storage, that is modified through the channel interface and that can 
affect the sequence or execution of instructions. For example, programs can change SRRO by writing the 
SPU WrSRRO channel, and SRRO is the internal execution state. State changes made through the channel 
interface might not be synchronized with SPU program execution. 


13.3 Synchronization Primitives 


The SPU provides three synchronization instructions: dsync, sync, and sync.c. These instructions have 
both consistency and instruction serializing effects, as shown in Table 13-2 Synchronization Instructions on 
page 255. Programs can use the consistency effects of these primitives to ensure that the local storage state 
is consistent with SPU loads and stores. The instruction serializing effects allow the SPU program to order its 
local storage access. 


The dsync instruction orders loads, stores, and channel accesses but not instruction fetches. When a dsync 
completes, the SPU will have completed all prior loads, stores, and channel accesses and will not have 
begun execution of any subsequent loads, stores, or channel accesses. At this time, an external read from a 
local storage address returns the data stored by the most recent SPU store to that address. SPU loads after 
the dsync return the data externally written before the moment when the dsync completes. The dsync 
instruction affects only SPU instruction sequencing and the consistency of loads and stores with respect to 
actual local storage state. The SPU does not broadcast dsync notification to external devices that access 
local storage, and, therefore, does not affect the state of the external devices. 


The sync instruction is much like dsync, but it also orders instruction fetches. Instruction fetches from a local 
storage address after a sync instruction return data stored by the most recent store instruction or external 
write to that address. The sync.c instruction builds upon the sync instruction. It ensures that the effects upon 
the internal state caused by prior wrch instructions are propagated and influence the execution of the 
following instructions. SPU execution begins with a start event and ends with a stop event. Both start and 
stop perform sync.c. 
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Table 13-2. Synchronization Instructions 





Instruction Consistency Effects Instruction Serialization Effects 








Forces load and store access of local storage because of 
instructions before the dsync to be completed before com- 
pletion of dsync. 

Forces read channel operations because of instructions 
Ensures that subsequent external reads access data written before the dsync to be completed before completion of the 





d by prior stores. dsync. 
andas Ensures that subsequent loads access data written by Forces load and store access of local storage because of 
external writes. instructions after the dsync to occur after completion of the 
dsync. 
Forces read and write channel operations because of 
instructions after the dsync to occur after completion of the 
dsync. 
Ensures that subsequent external reads access data written Forces all access of local storage and channels because of 
by prior stores. instructions before the sync to be completed before com- 
snis Ensures that subsequent instruction fetches access data pletion of sync. 
written by prior stores and external writes. Forces all access of local storage and channels because of 
Ensures that subsequent loads access data written by instructions after the sync to occur after completion of the 
external writes. sync. 








Ensures that subsequent external reads access data written 
by prior stores. 


Ensures that subsequent instruction fetches access data 
written by prior stores and external writes. 
sync.c i 

y Ensures that subsequent loads acess data written: by Forces all access of local storage and channels because of 


external ANIMOS: . . E instructions after the sync.c to occur after completion of the 
Ensures that subsequent instruction processing is influ- sync.c. 

enced by all internal execution states modified by previous 
wrch instructions. 


Forces all access of local storage and channels because of 
instructions before the sync.c to be completed before com- 
pletion of sync.c. 











Table 13-3 indicates which synchronization primitives are required between actions that modify local storage 
and other reads and writes of local storage. SPU programs do not require synchronization primitives between 
their own load and store instructions in order for load instructions to get the data stored by the last preceding 
store instruction. 


However, a program that stores into the instruction stream must execute a sync instruction before it reaches 
the newly stored instructions. The sync instruction forces the instruction fetch to read the instructions after 
the last store before the sync instruction. Without the sync instruction, the SPU might or might not execute 
the newly stored instruction. The SPU might execute the instruction in local storage at the time of the last 
sync event. 


When an external access of local storage occurs, and it is clear that the external access is before or after a 
particular SPU access of local storage, synchronization is required to force the data to move between the 
SPU and the external device. Without synchronization, the external device might see a local storage state 
that is inconsistent with any point of execution in the SPU program. 


For example, if an SPU program is to send data through local storage to an external reader, it must store the 
data and then execute a dsync instruction. If the external read occurs after the dsync instruction, it will read 
the stored data. If an SPU program is to load data put into local storage by an external writer, it must first 
execute a dsync instruction before it executes the load instruction. If the dsync instruction executes after the 
external write, the subsequent load instructions will be able to read the data stored by the external writer. 
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Table 13-3. Synchronizing Multiple Accesses to Local Storage 



































. Local Storage Access to be Synchronized with the Local Storage Write 
ied Store Load Fetch ExtRead ExtWrite 
Store nothing nothing sync dsync dsync 
ExtWrite dsync dsync sync N/A N/A 








Note: The SPU ISA does not define how external readers and writers should order their accesses to local storage. Table 13-3 shows 
entries that relate to external readers and writers as “N/A.” 











13.4 Caching SPU Local Storage Access 


Implementations of the SPU can feature caches of local storage data for either instructions, data, or both. 
These caches must reflect data to and from the local storage when synchronization requires the state of local 
storage to be consistent. The dsync instruction ensures that modified data is visible to external devices that 
access local storage, and that data modified by these external devices is visible to subsequent loads and 
stores. The sync instructions also ensure that data modified by either stores or external puts is visible to a 
subsequent instruction fetch. For example, an instruction cache that does not snoop must be invalidated 
when sync is executed, and a copy-back data cache that does not snoop must be flushed and invalidated 
when either sync or dsync is executed. 


13.5 Self-Modifying Code 


SPU programs can store instructions in local storage and execute them. If the SPU has already read the 
instructions from local storage, before the store, the new instructions are not seen by SPU execution. Self- 
modifying code should always execute a sync instruction before executing the stored code. The sync 
instruction ensures that all stores complete before the next instruction is fetched from local storage. 


13.6 External Local Storage Access 


Loads and stores do not necessarily access local storage in program order. Accesses from external devices 
can be interleaved in ways that are inconsistent with program order. The dsync instruction forces all 
preceding loads and stores to complete their local storage access before allowing any further loads or stores 
to be initiated, while sync ensures that the next instruction is fetched after the sync instruction is executed. 
An external device can synchronize with an SPU program through local storage access. 


Table 13-4 shows how an SPU program can reliably send to an external device, synchronizing only through 
the local storage. In this example, an SPU sends data through a buffer at address C to an external reader 
using a marker in local storage at address D. The SPU begins by storing the data to be transferred. It then 
executes a dsync instruction to force the data into local storage before it stores the marker. The dsync 
instruction also prevents the marker store from being reordered amongst the data stores. After the marker 
store, the SPU program must execute a dsync instruction again to force the marker into local storage. 


Table 13-5 shows how data can move from an external writer to the SPU program using local-storage-based 
synchronization. The SPU program starts by polling for the marker that indicates that data is ready. The 
polling loop begins with a dsync instruction that forces subsequent load instructions to get data from the now 
current local storage state. When the marker is found, the SPU program must execute a dsync instruction 
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again to prevent the data loads from being performed before the marker load. If such reordering were to 
occur, it would be possible for the marker write to occur between the reordered data loads and the delayed 
marker load. In this case, the data loads would receive stale data. 


Table 13-4. Sending Data and Synchronizing through Local Storage 


























External 
Device SPU Comment 
Store data to C 
Force a subsequent store to follow the store 
dsvnc to C; that is, there will be no view of local 
y storage where the marker is present in D 
but the data is not yet in C. 
Store marker to D 
dsyne Force the store to D to be visible in local 
y storage to external readers. 
eloop: Read D 
If not marker, goto eloop 
Read C 




















Table 13-5. Receiving Data and Synchronizing through Local Storage 





External 


Device SPU Comment 








This is the order in which the external 
Write data to A device modifies local storage. The ordering 
is not controlled by the SPU ISA. 





Write marker to B 





Force a subsequent load to access local 
loop: dsync storage, so that the load arriving from B will 
get new data from local storage. 


Load from B 





Ensure A and B are both written to local 


If not marker, goto loop storage 





Force a subsequent load to execute after 
the load from B. Without this dsync, the 
dsync load from A could be performed before the 
load from B and get local storage contents 
before the write to A. 




















Load from A Must get data from the write to A. 


13.7 Speculation and Reordering of Channel Reads and Channel Writes 


The SPU does not reorder or speculatively execute channel reads or channel writes. All operations at the 
channel interface represent instructions in the order they occur in the program. 


Version 1.2 Synchronization and Ordering 
January 27, 2007 Page 257 of 278 


SONY 


Instruction Set Architecture €» SONY 


M 


Synergistic Processor Unit 


13.8 Channel Interface with External Device 


The channel interface delivers channel reads and writes to the SPU interface in program order, but there are 
no ordering guarantees with respect to load and stores. It is possible that a message sent to an external 
device may trigger the external device to directly access local storage. SPU programs might want to use 
either sync or dsync instructions, or both, to order SPU loads and stores relative to the external accesses. 
Table 13-6 shows how an SPU program might reliably send and receive data from an external device 
synchronizing through the channel interface. 


Table 13-6. Synchronizing through the Channel Interface 





External 


Device SPU Comment 








SPU receives data through local storage address A 





Write data to A 





Send message to channel B The ordering is not controlled by the SPU 








ISA. 

rdch B Wait for message 

dsync Ensure load from A is executed after rdch, 
and access the data in local storage 

load from A Must get data 





SPU sends data through local storage address C 





Store data to C 





dsync Ensure data is in local storage 





wrch D Send message 





Receive message from channel D 





The ordering is not controlled by the SPU 


Read data from C ISA. 

















Note: The SPU architecture does not specify what actions an external device can perform in response to a 
channel read or write. The SPU does not wait for those actions to complete, and it does not synchronize the 
state of local storage before or after the channel operation. 


13.9 Execution State Set by an SPU Program through the Channel Interface 


Some SPU channels can control aspects of SPU execution state; for example, SRRO. State changes made 
through channel writes might not affect subsequent instructions. Execution of the sync.c instruction ensures 
that the new state does affect the next instruction. 


13.10 Execution State Set by an External Device 


Execution state changes made by an external device are ordered with respect to other externally requested 
state changes but not with respect to SPU instruction execution. The external device can stop the SPU, make 
execution state changes, start the SPU, and be certain the new state is visible to program execution. 
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Appendix A. Instruction Table Sorted by Instruction Mnemonic 


Table A-1. Instructions Sorted by Mnemonic (Page 1 of 6) 

















































































































Mnemonic Instruction Page 
a Add Word 60 
absdb Absolute Differences of Bytes 92 
addx Add Extended 66 
ah Add Halfword 58 
ahi Add Halfword Immediate 59 
ai Add Word Immediate 61 
and And 97 
andbi And Byte Immediate 99 
andc And with Complement 98 
andhi And Halfword Immediate 100 
andi And Word Immediate 101 
avgb Average Bytes 91 
bg Borrow Generate 70 
bgx Borrow Generate Extended 71 
bi Branch Indirect 178 
bihnz Branch Indirect If Not Zero Halfword 189 
bihz Branch Indirect If Zero Halfword 188 
binz Branch Indirect If Not Zero 187 
bisl Branch Indirect and Set Link 181 
bisled Branch Indirect and Set Link if External Data 180 
biz Branch Indirect If Zero 186 
br Branch Relative 174 
bra Branch Absolute 175 
brasl Branch Absolute and Set Link 177 
brhnz Branch If Not Zero Halfword 184 
brhz Branch If Zero Halfword 185 
brnz Branch If Not Zero Word 182 
brsl Branch Relative and Set Link 176 
brz Branch If Zero Word 183 
cbd Generate Controls for Byte Insertion (d-form) 40 
cbx Generate Controls for Byte Insertion (x-form) 41 
cdd Generate Controls for Doubleword Insertion (d-form) 46 
cdx Generate Controls for Doubleword Insertion (x-form) 47 
ceq Compare Equal Word 160 
ceqb Compare Equal Byte 156 
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Table A-1. Instructions Sorted by Mnemonic (Page 2 of 6) 



































































































































Mnemonic Instruction Page 
ceqbi Compare Equal Byte Immediate 157 
ceqh Compare Equal Halfword 158 
ceqhi Compare Equal Halfword Immediate 159 
ceqi Compare Equal Word Immediate 161 
cflts Convert Floating to Signed Integer 221 
cfltu Convert Floating to Unsigned Integer 223 

cg Carry Generate 67 
cgt Compare Greater Than Word 166 
cgtb Compare Greater Than Byte 162 
cgtbi Compare Greater Than Byte Immediate 163 
cgth Compare Greater Than Halfword 164 
cgthi Compare Greater Than Halfword Immediate 165 
cgti Compare Greater Than Word Immediate 167 
cgx Carry Generate Extended 68 
chd Generate Controls for Halfword Insertion (d-form) 42 
chx Generate Controls for Halfword Insertion (x-form) 43 
clgt Compare Logical Greater Than Word 172 
clgtb Compare Logical Greater Than Byte 168 
clgtbi Compare Logical Greater Than Byte Immediate 169 
clgth Compare Logical Greater Than Halfword 170 
clgthi Compare Logical Greater Than Halfword Immediate 171 
clgti Compare Logical Greater Than Word Immediate 173 
clz Count Leading Zeros 83 
cntb Count Ones in Bytes 84 
csfit Convert Signed Integer to Floating 220 
cuflt Convert Unsigned Integer to Floating 222 
cwd Generate Controls for Word Insertion (d-form) 44 
cwx Generate Controls for Word Insertion (x-form) 45 
dfa Double Floating Add 203 
dfceq Double Floating Compare Equal 226 
dfcgt Double Floating Compare Greater Than 228 
dfcmeq Double Floating Compare Magnitude Equal 227 
dfcmgt Double Floating Compare Magnitude Greater Than 229 
dfm Double Floating Multiply 207 
dfma Double Floating Multiply and Add 209 
dfms Double Floating Multiply and Subtract 213 
dfnma Double Floating Negative Multiply and Add 214 
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Table A-1. Instructions Sorted by Mnemonic (Page 3 of 6) 
































































































































Mnemonic Instruction Page 
dfnms Double Floating Multiply and Subtract 213 
dfs Double Floating Subtract 205 
dftsv Double Floating Test Special Value 230 
dsync Synchronize Data 243 
eqv Equivalent 114 
fa Floating Add 202 
fceq Floating Compare Equal 231 
fegt Floating Compare Greater Than 233 
fcmeq Floating Compare Magnitude Equal 232 
fcmgt Floating Compare Magnitude Greater Than 234 
fesd Floating Extend Single to Double 225 
fi Floating Interpolate 219 
fm Floating Multiply 206 
fma Floating Multiply and Add 208 
fms Floating Multiply and Subtract 212 
fnms Floating Negative Multiply and Subtract 210 
frds Floating Round Double to Single 224 
frest Floating Reciprocal Estimate 215 
frsqest Floating Reciprocal Absolute Square Root Estimate 217 
fs Floating Subtract 204 
fscrrd Floating-Point Status and Control Register Write 235 
fscrwr Floating-Point Status and Control Register Read 236 
fsm Form Select Mask for Words 87 
fsmb Form Select Mask for Bytes 85 
fsmbi Form Select Mask for Bytes Immediate 55 
fsmh Form Select Mask for Halfwords 86 
gb Gather Bits from Words 90 
gbb Gather Bits from Bytes 88 
gbh Gather Bits from Halfwords 89 
hbr Hint for Branch (r-form) 192 
hbra Hint for Branch (a-form) 193 
hbrr Hint for Branch Relative 194 
heq Halt If Equal 150 
heqi Halt If Equal Immediate 151 
hgt Halt If Greater Than 152 
hgti Halt If Greater Than Immediate 153 
higt Halt If Logically Greater Than 154 
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Table A-1. Instructions Sorted by Mnemonic (Page 4 of 6) 



































































































































Mnemonic Instruction Page 
hlgti Halt If Logically Greater Than Immediate 155 
il Immediate Load Word 52 
ila Immediate Load Address 53 
ilh Immediate Load Halfword 50 
ilhu Immediate Load Halfword Upper 51 
iohl Immediate Or Halfword Lower 54 
iret Interrupt Return 179 
Inop No Operation (Load) 240 
Iqa Load Quadword (a-form) 34 
Iqd Load Quadword (d-form) 32 
Iqr Load Quadword Instruction Relative (a-form) 35 
Iqx Load Quadword (x-form) 33 
mfspr Move from Special-Purpose Register 244 
mpy Multiply 72 
mpya Multiply and Add 76 
mpyh Multiply High 77 
mpyhh Multiply High High 79 
mpyhha Multiply High High and Add 80 
mpyhhau Multiply High High Unsigned and Add 82 
mpyhhu Multiply High High Unsigned 81 
mpyi Multiply Immediate 74 
mpys Multiply and Shift Right 78 
mpyu Multiply Unsigned 73 
mpyui Multiply Unsigned Immediate 75 
mtspr Move to Special-Purpose Register 245 
nand Nand 112 
nop No Operation (Execute) 241 
nor Nor 113 
or Or 102 
orbi Or Byte Immediate 104 
orc Or with Complement 103 
orhi Or Halfword Immediate 105 
ori Or Word Immediate 106 
orx Or Across 107 
rchcnt Read Channel Count 249 
rdch Read Channel 248 
rot Rotate Word 129 
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Mnemonic Instruction Page 
roth Rotate Halfword 127 
rothi Rotate Halfword Immediate 128 

rothm Rotate and Mask Halfword 136 
rothmi Rotate and Mask Halfword Immediate 137 
roti Rotate Word Immediate 130 
rotm Rotate and Mask Word 138 
rotma Rotate and Mask Algebraic Word 147 
rotmah Rotate and Mask Algebraic Halfword 145 
rotmahi Rotate and Mask Algebraic Halfword Immediate 146 
rotmai Rotate and Mask Algebraic Word Immediate 148 
rotmi Rotate and Mask Word Immediate 139 
rotqbi Rotate Quadword by Bits 134 
rotqbii Rotate Quadword by Bits Immediate 135 
rotqby Rotate Quadword by Bytes 131 
rotqbybi Rotate Quadword by Bytes from Bit Shift Count 133 
rotqbyi Rotate Quadword by Bytes Immediate 132 
rotqmbi Rotate and Mask Quadword by Bits 143 

rotqmbii Rotate and Mask Quadword by Bits Immediate 144 

rotqmby Rotate and Mask Quadword by Bytes 140 

rotqmbybi Rotate and Mask Quadword Bytes from Bit Shift Count 142 

rotqmbyi Rotate and Mask Quadword by Bytes Immediate 141 
selb Select Bits 115 

sf Subtract from Word 64 
sfh Subtract from Halfword 62 
sfhi Subtract from Halfword Immediate 63 
sfi Subtract from Word Immediate 65 
sfx Subtract from Extended 69 
shl Shift Left Word 120 
shih Shift Left Halfword 118 
shlhi Shift Left Halfword Immediate 119 
shli Shift Left Word Immediate 121 

shlqbi Shift Left Quadword by Bits 122 
shlqbii Shift Left Quadword by Bits Immediate 123 
shlqby Shift Left Quadword by Bytes 124 
shiqbybi Shift Left Quadword by Bytes from Bit Shift Count 126 
shiqbyi Shift Left Quadword by Bytes Immediate 125 
shufb Shuffle Bytes 116 











Version 1.2 Instruction Table Sorted by Instruction Mnemonic 


January 27, 2007 


Page 263 of 278 


SONY 


Instruction Set Architecture €» SONY 


SOMARI 


Synergistic Processor Unit 


Table A-1. Instructions Sorted by Mnemonic (Page 6 of 6) 







































































Mnemonic Instruction Page 
stop Stop and Signal 238 
stopd Stop and Signal with Dependencies 239 
stqa Store Quadword (a-form) 38 
stqd Store Quadword (d-form) 36 
stqr Store Quadword Instruction Relative (a-form) 39 
stqx Store Quadword (x-form) 37 
sumb Sum Bytes into Halfwords 93 
sync Synchronize 242 
wrch Write Channel 250 
xor Exclusive Or 108 
xorbi Exclusive Or Byte Immediate 109 
xorhi Exclusive Or Halfword Immediate 110 
xori Exclusive Or Word Immediate 111 
xsbh Extend Sign Byte to Halfword 94 
xshw Extend Sign Halfword to Word 95 
xswd Extend Sign Word to Doubleword 96 
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Appendix B. Details of the Generate Controls Instructions 


The tables in this section show the details of the masks that are generated by the eight generate controls 
instructions. The masks that are shown are intended for use as the RC operand of the shuffle bytes, shufb, 
instruction. Each row in a table shows the rightmost 4 bits of the effective address. An x in the first column 
indicates an ignored bit. Blanks within the “created mask" are shown only to improve clarity. 


See the following tables, as applicable: 


* For byte insertion, see Table B-1 Byte Insertion: Rightmost 4 Bits of the Effective Address and Created 


Mask on page 265. 


* For halfword insertion, see Table B-2 Halfword Insertion: Rightmost 4 Bits of the Effective Address and 


Created Mask on page 266. 


* For word insertion, see Table B-3 Word Insertion: Rightmost 4 Bits of the Effective Address and Created 


Mask on page 266. 


* For doubleword insertion, see Table B-4 Doubleword Insertion: Rightmost 4 Bits of Effective Address and 


Created Mask on page 266. 


Table B-1. Byte Insertion: Rightmost 4 Bits of the Effective Address and Created Mask 


















































Rightmost pus of the Effective Created Mask 

ddress 

0000 03 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 
0001 10 03 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 
0010 10 11 03 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 
0011 10 11 12 03 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f 
0100 10 11 12 18 03 15 16 17 18 19 1a 1b 1c 1d 1e 1f 
0101 10 11 12 13 14 03 16 17 18 19 1a 1b 1c 1d 1e 1f 
0110 10 11 12 13 14 15 03 17 18 19 1a 1b 1c 1d 1e 1f 
0111 10 11 12 13 14 15 16 03 18 19 1a 1b 1c 1d 1e 1f 
1000 10 11 12 13 14 15 16 17 03 19 1a 1b 1c 1d 1e 1f 
1001 10 11 12 183 14 15 16 17 18 03 1a 1b 1c 1d 1e 1f 
1010 10 11 12 18 14 15 16 17 18 19 03 1b 1c 1d 1e 1f 
1011 10 11 12 183 14 15 16 17 18 19 1a O3 1c 1d 1e 1f 
1100 10 11 12 183 14 15 16 17 18 19 1a 1b 03 1d 1e 1f 
1101 10 11 12 18 14 15 16 17 18 19 1a 1b 1c 03 1e 1f 
1110 10 11 12 183 14 15 16 17 18 19 1a 1b 1c 1d 03 1f 
1111 10 11 12 18 14 15 16 17 18 19 1a 1b 1c 1d 1e 03 
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Table B-2. Halfword Insertion: Rightmost 4 Bits of the Effective Address and Created Mask 





Rightmost 4 Bits of the Effective 


Created Mask 



































Address 
000x 0203 1213 1415 1617 1819 1a1b 1c1d 1e1f 
001x 1011 0203 1415 1617 1819 1a1b 1c1d 1e1f 
010x 1011 1213 0203 1617 1819 1a1b 1c1d 1e1f 
011x 1011 1213 1415 0203 1819 1a1b 1c1d 1e1f 
100x 1011 1213 1415 1617 0203 1a1b 1c1d 1e1f 
101x 1011 1213 1415 1617 1819 0203 1c1d 1e1f 
110x 1011 1213 1415 1617 1819 1a1b 0203 1e1f 
111x 1011 1213 1415 1617 1819 1a1b 1c1d 0203 





Table B-3. Word Insertion: Rightmost 4 Bits of the Effective Address and Created Mask 





Rightmost 4 Bits of the Effective 


Created Mask 























Address 
00xx 00010203 14151617 18191a1b 1c1d1e1f 
01xx 10111213 00010203 18191a1b 1c1d1e1f 
10xx 10111213 14151617 00010203 1c1d1e1f 
11xx 10111213 14151617 18191a1b 00010203 





Table B-4. Doubleword Insertion: Rightmost 4 Bits of Effective Address and Created Mask 
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Rightmost 4 Bits of the Effective 
Address Created Mask 
Oxxx 0001020304050607 18191a1b1c1d1e1f 
1xxx 1011121303151617 0001020304050607 
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Cell Broadband Engine 
Architecture 
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DBZ 
DIFF 
DMA 


double precision 


effective address 


exception 
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A detailed specification of requirements for a processor or computer system. It 
does not specify details of how the processor or computer system must be imple- 
mented; instead it provides a template for a family of compatible implementations. 


A byte-ordering method in memory where the address n of a word corresponds to 
the most significant byte. In an addressed memory word, the bytes are ordered (left 
to right) 0, 1, 2, 3, with O being the most significant byte. See little-endian. 


branch indirect and set link if external data instruction 


High-speed memory close to a processor. A cache usually contains recently- 
accessed data or instructions, but certain cache-control instructions can lock, evict, 
or otherwise modify the caching of data or instructions. 


See Cell Broadband Engine Architecture. 


Extends the PowerPC 64-bit architecture with loosely coupled cooperative off-load 
processors. The Cell Broadband Engine Architecture provides a basis for the 
development of microprocessors targeted at the game, multimedia, and real-time 
market segments. The Cell Broadband Engine is one implementation of the Cell 
Broadband Engine Architecture. 


Channels are unidirectional, function-specific registers or queues. They are the 
primary means of communication between an SPE’s SPU and its MFC, which in 
turn mediates communication with the PPEs, other SPEs, and other devices. 
These other devices use MMIO registers in the destination SPE to transfer informa- 
tion on the channel interface of that destination SPE. 


Specific channels have read or write properties, and blocking or nonblocking prop- 
erties. Software on the SPU uses channel commands to enqueue DMA 
commands, query DMA and processor status, perform MFC synchronization, 
access auxiliary resources such as the decrementer (timer), and perform interpro- 
cessor-communication via mailboxes and signal-notification. 


Divide by zero. 
IEEE noncompliant result. 


Direct memory access. A technique for using a special-purpose controller to 
generate the source and destination addresses for a memory or I/O transfer. 


The specification that causes a floating-point value to be stored (internally) in the 
long format (two computer words). 


An address generated or used by a program to reference memory. A memory- 
management unit translates an effective address (EA) to a virtual address (VA), 
which it then translates to a real address (RA) that accesses real (physical) 
memory. The maximum size of the effective-address space is 2% bytes. 


An error, unusual condition, or external signal that may alter a status bit and will 
cause a corresponding interrupt, if the interrupt is enabled. 
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fetch Retrieving instructions from either the cache or system memory and placing them 
into the instruction queue. 


floating point A way of representing real numbers (that is, values with fractions or decimals) in 32 
bits or 64 bits. Floating-point representation is useful to describe very small or very 
large numbers. 


FPU Floating-point unit. 
fscrrd Floating-Point Status and Control Register read instruction. 
fscrwr Floating-Point Status and Control Register write instruction. 


general purpose register An explicitly addressable register that can be used for a variety of purposes (for 
example, as an accumulator or an index register). 


GPR See general purpose register. 


guarded Prevented from responding to speculative loads and instruction fetches. The oper- 
ating system typically implements guarding, for example, on all I/O devices. 


implementation A particular processor that conforms to the architecture, but may differ from other 
architecture-compliant implementations for example in design, feature set, and 
implementation of optional features. 


Inf Infinity. 


instruction cache A cache for providing program instructions to the processor faster than they can be 
obtained from system memory. 


INV Invalid operation. 

INX Inexact result. 

iohl Immediate or halfword lower instruction. 

iret interrupt return instruction 

ISA Instruction set architecture. 

KB Kilobyte. 

least significant bit The bit of least value in an address, register, data element, or instruction encoding. 

least significant byte The byte of least value in an address, register, data element, or instruction 
encoding. 

little-endian A byte-ordering method in memory where the address n of a word corresponds to 


the least significant byte. In an addressed memory word, the bytes are ordered (left 
to right) 3, 2, 1, 0, with 3 being the most significant byte. See big-endian. 


local storage The storage associated with each SPE. It holds both instructions and data. 


LSA Local Storage Address. An address in the LS of an SPU, by which programs 
running in the SPU and DMA transfers managed by the MFC access the LS. 
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mask 


MFC 


mfspr 


most significant bit 
most significant byte 


MSb 
MSB 
MSR 
mtspr 
NaN 
OVF 

PC 
PowerPC 


PowerPC 
Architecture 


PPE 


QNaN 
rchent 
rdch 
RNO 
RN1 
RO 
ROH 
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See least significant bit. 


A pattern of bits used to accept or reject bit patterns in another set of data. Hard- 
ware interrupts are enabled and disabled by setting or clearing a string of bits, with 
each interrupt assigned a bit position in a mask register 


Memory flow controller. It is part of an SPE and provides two main functions: 
moves data using DMA between the SPE's local storage (LS) and main storage, 
and synchronizes the SPU with the rest of the processing units in the system. 


Move from special-purpose register instruction. 


The highest-order bit in an address, registers, data element, or instruction 
encoding. 


The highest-order byte in an address, registers, data element, or instruction 
encoding. 


See most significant bit. 

See most significant byte. 

Machine state register. 

Move to special-purpose register instruction. 
Not a number 

Overflow 

program counter. 


Of or relating to the PowerPC Architecture or the microprocessors that implement 
this architecture. 


A computer architecture that is based on the third generation of reduced instruction 
set computer (RISC) processors. The PowerPC architecture was developed jointly 
by Apple, Motorola, and IBM. 


PowerPC Processor Element. A general-purpose processor in the Cell Broadband 
Engine. 


Quiet NaN. 

Read channel counter instruction. 

Read from channel instruction. 

Rounding control for slice O of the 2-way SIMD double-precision operations. 
Rounding control for slice 1 of the 2-way SIMD double-precision operations. 
relative offset 


relative offset high 
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ROL relative offset low 

RTL register transfer language 

shufb shuffle bytes instruction 

signal Information sent on a signal-notification channel. These channels are inbound (to 


an SPE) registers. They can be used by a PPE or other processor to send informa- 
tion to an SPE. Each SPE has two 32-bit signal-notification registers, each of which 
has a corresponding memory-mapped I/O (MMIO) register into which the signal- 
notification data is written by the sending processor. Unlike mailboxes, they can be 
configured for either one-to-one or many-to-one signalling. 


SIMD Single instruction, multiple data. Processing in which a single instruction operates 
on multiple data elements that make up a vector data-type. Also known as vector 
processing. This style of programming implements data-level parallelism. 


SNaN Signalling NaN. 


snoop To compare an address on a bus with a tag in a cache, to detect operations that 
violate memory coherency. 


SPR Special-purpose register. 


SPU Synergistic Processor Unit. The part of an SPE that executes instructions from its 
local storage (LS). 


SRAM static random access memory 

sync Synchronize command. 

synchronization The process of arranging storage operations to complete in the order of occur- 
rence. 

UNF Underflow 

vector An instruction operand containing a set of data elements packed into a one-dimen- 


sional array. The elements can be fixed-point or floating-point values. Most 
Vector/SIMD Multimedia Extension and SPU SIMD instructions operate on vector 
operands. Vectors are also called SIMD operands or packed operands. 


word Four bytes. 


wrch Write to channel instruction. 
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Index andhi, 100 
andi, 101 


architectural overview, 25 
assignment symbol, 20 


Symbols audience, for manual, 13 
average bytes, 91 

&, defined, 20 avgb p ps 

*, defined, 20 

+, defined, 20 

-, defined, 20 

/, II, Ill, defined, 19, 20 B 

=, defined, 20 

I*|, defined, 20 ae 

|, defined, 20 gs 

defined. 20 bi, 178 
Sr SEINER bi instruction, 251 
>, defined, 20 f 
: bihnz, 189 
z, defined, 20 ; : : 
: bihnz instruction, 251 
®, defined, 20 f 
defined, 20 2e ded 

Tu d bihz instruction, 251 
binary values in register RC and byte results, 116 
binz, 187 

Numerics binz instruction, 251 
bisl, 181 


10-bit immediate, 19 bisl instruction, 251 
16-bit immediate, 19 i 

: , bisled, 180 
32-bit channels, 248, 250 
32-bit registers, 244, 245 
7-bit immediate, 19 
8-bit immediate, 19 


bisled instruction, 251 

bit and byte numbering, 26—27 
bit encoding, conventions for, 16 
bit ordering, conventions for, 16 
bit ranges, conventions for, 17 


biz, 186 
A biz instruction, 251 
borrow generate, 70 
a, 60 borrow generate extended, 71 
absab, 92 br, 174 
absolute differences of bytes, 92 bra, 175 
add extended, 66 branch absolute, 175 
add halfword, 58 branch absolute and set link, 177 
add halfword immediate, 59 branch if not zero halfword, 184 
add word, 60 — branch if not zero word, 182 
add word immediate, 61 branch if zero halfword, 185 
addition, two's complement, 20 branch if zero word, 183 
additional resources, 13 branch indirect, 178 
addx, 66 branch indirect and set link, 181 
ah, 58 branch indirect and set link if external data, 180 
ahi, 59 branch indirect if not zero, 187 
ai, 61 branch indirect if not zero halfword, 189 
algebraic right shift, 136, 137, 138, 139 branch indirect if zero, 186 
and, 97 


branch indirect if zero halfword, 188 


and (mnemonic), 97 branch instructions, 149—189 
and byte immediate, 99 branch relative, 174 

and halfword immediate, 100 branch relative and set link, 176 
and with complement, 98 brasl, 177 

and word immediate, 101 BINZ 184 

AND, defined, 20 brhz, 185 

andbi, 99 brinst variable, 191 

andc, 98 
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brnz, 182 

brsl, 176 

brtarg variable, 191 

brz, 183 

byte insertion, 265 

byte ordering, conventions for, 16 


C 


caching, SPU local storage access, 254, 256 
carry generate, 67 

carry generate extended, 68 

cbd, 40 

cbx, 41 

cdd, 46 

cdx, 47 

ceq, 160 

ceqb, 156 

ceqbi, 157 

cegh, 158 

ceghi, 159 

ceqi, 161 

cflts, 221 

cfltu, 223 

cg, 67 

cgt, 166 

cgtb, 162 

cgtbi, 163 

cgth, 164 

cgthi, 165 

cgli, 167 

cgx, 68 

channel instructions, 247—250 
channel interface, 258 

channel reads and writes, 257 
channels, conventions for, 17 

chd, 42 

chx, 43 

clgt, 172 

clgtb, 168 

clgtbi, 169 

clgth, 170 

clgthi, 171 

clgti, 173 

clz, 83 

cntb, 84 

compare equal byte, 156 

compare equal byte immediate, 157 
compare equal halfword, 158 
compare equal halfword immediate, 159 
compare equal word, 160 

compare equal word immediate, 161 
compare greater than byte, 162 
compare greater than byte immediate, 163 
compare greater than halfword, 164 
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compare greater than halfword immediate, 165 

compare greater than word, 166 

compare greater than word immediate, 167 

compare instructions, 149—189 

compare instructions for floating point. See floating-point 
instructions 

compare logical greater than byte, 168 

compare logical greater than byte immediate, 169 

compare logical greater than halfword, 170 

compare logical greater than halfword immediate, 171 

compare logical greater than word, 172 

compare logical greater than word immediate, 173 

compute-mask instructions. See generate controls instruc- 
tions 

conditional branch instructions, described, 149 

conditional execution, 20 

constant-formation instructions, 49—55 

control instructions, 237—245 

conventions used in this manual, 16—17 

convert floating to signed integer, 221 

convert floating to unsigned integer, 223 

convert signed integer to floating, 220 

convert unsigned integer to floating, 222 

converting between single and double-precision formats, 
198 

count leading zeros, 83 

count ones in bytes, 84 

csfit, 220 

cuflt, 222 

cwd, 44 

cwx, 45 


D 


data layout in registers, 28 

data representation, 25 

DBZ. See divide-by-zero (DBZ) exception condition 

DENORM. See denormal input forced to zero (DENORM) 
exception condition 

denormal input forced to zero (DENORM) exception con- 
dition, 198, 199 

denorms, support for, 196 

dfa, 203 

dfceq, 226 

dfcgt, 228 

dfcmeq, 227 

dfcmgt, 229 

dfm, 207 

dfma, 209 

dfms, 213 

dfnma, 214 

dfnms, 211 

dfs, 205 

dftsv, 230 

divide-by-zero (DBZ) exception condition, 196 
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do ... while (cond), 20 
document organization, 13 
documents, related, 13 
double floating add, 203 
double floating compare equal, 226 
double floating compare greater than, 228 
double floating compare magnitude equal, 227 
double floating compare magnitude greater than, 229 
double floating multiply, 207 
double floating multiply and add, 209 
double floating multiply and subtract, 213 
double floating negative multiply and add, 214 
double floating negative multiply and subtract, 211 
double floating subtract, 205 
double floating test special value, 230 
double-precision (IEEE mode) minimum and maximum 
values, 197 
double-precision format, converting, 198 
double-precision operations, 197 
doubleword insertion, 266 
doublewords 
bit and byte numbering, 26 
dsync, 243 
dsync instruction, 254, 255 
caching SPU local storage access and, 256 
external local storage access and, 256 
SPU loads and stores and, 258 


E 


equals sign, 20 
equivalent, 114 
eqv, 114 
example LSLR values and corresponding local storage 
sizes, 31 
exception conditions, 198 
denormal input forced to zero (DENORM), 199 
divide-by-zero (DBZ), 196 
IEEE noncompliant result (DIFF), 196 
inexact result (INX), 198 
invalid operation (INV), 199 
not propagated NaN, 199 
overflow (OVF), 196, 198 
underflow (UNF), 196, 199 
exception settings for instructions, 200 
exclusive or, 108 
exclusive or byte immediate, 109 
exclusive or halfword immediate, 110 
exclusive OR symbol, 20 
exclusive or word immediate, 111 
execution state 
set by an external device, 258 
set by an SPU program through the channel interface, 
258 
execution state set by an external device, 258 
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execution state set by an SPU program through the chan- 
nel interface, 258 

extend sign byte to halfword, 94 
extend sign halfword to word, 95 
extend sign word to doubleword, 96 
extended-range mode operations, 195 
extending numbers, 195, 198 
external device 

behavior of, 253 

channel interface and, 258 

setting execution state, 258 
external local storage access, 256 
extread transaction, 253 
extwrite transaction, 253 


F 


fa, 202 

fceq, 231 

fcgt, 233 

fcmeq, 232 

fcmgt, 234 

feature bits (d) and (e), settings and results, 251 
features of SPU ISA, 23 

fesd, 225 

fetch transaction, 253 

fi, 219 

fields, conventions for, 17 

floating add, 202 

floating compare equal, 231 

floating compare greater than, 233 

floating compare magnitude equal, 232 

floating compare magnitude greater than, 234 
floating extend single to double, 225 

floating interpolate, 219 

floating multiply, 206 

floating multiply and add, 208 

floating multiply and subtract, 212 

floating negative multiply and subtract, 210 
floating reciprocal absolute square root estimate, 217 
floating reciprocal estimate, 215 

floating round double to single, 224 

floating subtract, 204 

floating-point instructions, 195—236 
Floating-Point Status and Control Register, 200—201 
floating-point status and control register read, 236 
floating-point status and control register write, 235 
fm, 206 

fma, 208 

fms, 212 

fnms, 210 

for ... end, 20 

form select mask for bytes, 85 

form select mask for bytes immediate, 55 

form select mask for halfwords, 86 
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form select mask for words, 87 
format 
for instruction descriptions, 15 
of Floating-Point Status and Control Register, 200 
FPSCR. See Floating-Point Status and Control Register 
frds, 224 
frest, 215 
frsqest, 217 
fs, 204 
fscrrd, 236 
fscrrd instruction, 200 
fscrwr, 235 
fscrwr instruction, 200 
fsm, 87 
fsmb, 85 
fsmbi, 55 
fsmh, 86 


G 


gather bits from bytes, 88 
gather bits from halfwords, 89 
gather bits from words, 90 
gb, 90 
gbb, 88 
gbh, 89 
general-purpose register fields, 19 
generate controls 
for byte insertion (d-form) 
for byte insertion (x-form) 
for doubleword insertion (d-form), 46 
for doubleword insertion (x-form), 47 
for halfword insertion (d-form), 42 
for halfword insertion (x-form), 43 
for insertion instructions, 40—47 
for word insertion (d-form), 44 
for word insertion (x-form), 45 
generate controls instructions, details, 265 
GPR fields, 19 


, 40 
,41 


H, I, J, K 


halfword insertion, 266 
halfwords 
bit and byte numbering, 26 
halt if equal, 150 
halt if equal immediate, 151 
halt if greater than, 152 
halt if greater than immediate, 153 
halt if logically greater than, 154 
halt if logically greater than immediate, 155 
halt instructions, 149—189 
hbr, 192 
hbra, 193 
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hbrr, 194 
heq, 150 
heqi, 151 
hgt, 152 
hgti, 153 
hint for branch (a-form), 193 
hint for branch (r-form), 192 
hint for branch relative, 194 
hint-for-branch instructions, 191 
higt, 154 
higti, 155 
110, defined, 19 
116, defined, 19 
17, defined, 19 
18, defined, 19 
IEEE noncompliant result (DIFF) exception condition, 196 
IEEE Standard 754, 195 
IEEE standard floating point versus SPU floating point, 
195 
IEEE standard, compliance with, 198 
if (cond) then ... else ..., 20 
il, 52 
ila, 53 
ilh, 50 
ilhu, 51 
immediate load address, 53 
immediate load halfword, 50 
immediate load halfword upper, 51 
immediate load word, 52 
immediate or halfword lower, 54 
inexact result (INX) exception condition, 198 
Inf. See infinity (Inf), support for 
infinity (Inf), support for, 195 
inline prefetching, 192 
inserting bytes, 265 
inserting doublewords, 266 
inserting halfwords, 266 
inserting words, 266 
instruction descriptions 
format for, 15 
how to use, 15 
instruction fields, 19 
instruction formats, 28 
instruction mnemonics, 259—264 
instruction operation notations, 20 
instructions 
branch, 149—189 
channel, 247—250 
compare, 149—189 
constant-formation, 49—55 
control, 237—245 
described, 16 
exception settings and, 196, 200 
floating point, 195—236 
halt, 149—189 
integer, 57—116 
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logical, 57—116 M 
memory—load and store, 31—41 
reserved field, 19 manual 
rotate, 117—148 conventions for, 16 
shift, 117—148 organization of, 13 
sorted by mnemonic, 259 purpose of, 13, 23 
integer instructions, 57—116 memory instructions, 31—41 
internal execution state, 254 mfspr, 244 l 
interrupt facility, 251, 252 minimum and maximum values 
interrupt handler, 251 double-precision (IEEE mode), 197 
interrupt return, 179 single-precision (extended-range mode), 195, 198 
INV. See invalid operation (INV) exception condition mnemonics, 16, 259—264 — 
invalid operation (INV) exception condition, 199 move from special-purpose register, 244 
INX. See inexact result (INX) exception condition move to special-purpose register, 245 
iohl, 54 mpy, 72 
iret, 179 mpya, 76 
iret instruction, 251, 252 mpyh, 77 
ISA support, 23, 24 mpyhh, 79 
mpyhha, 80 
mpyhhau, 82 
mpyhhu, 81 
L mpyi, 74 
mpys, 78 


legal notices, 2 7 
Inop, 240 mpyu, 73 


load and store instructions, 31—41 mpyui, 
load quadword (a-form), 34 mitspr, , 
multiply, 72 


load quadword (d-form), 32 

load quadword (x-form), 33 

load quadword instruction relative (a-form), 35 

load transaction, 253 

load/store architecture, 23 

local storage 
synchronizing multiple accesses to, 256 
synchronizing through, 257 

local storage access. See SPU local storage access 

local storage address, 20 

Local Storage Limit Register, 20, 31 

local storage transactions, 253 

LocStor(x,y), 18 


multiply and add, 76 

multiply and shift right, 78 

multiply high, 77 

multiply high high, 79 

multiply high high and add, 80 

multiply high high unsigned, 81 

multiply high high unsigned and add, 82 
multiply immediate, 74 

multiply unsigned, 73 

multiply unsigned immediate, 75 


LocStor, defined, 20 N 
logical comparison instructions, described, 149 
logical instructions, 57—116 NaN. See not a number (NaN), support for B 
logical right shift, 136, 137, 138, 139 NaN. See not propagated NaN exception condition 
loops, 20 nand, 112 
iqa, 34 nand (mnemonic), 112 
Iqd, 32 negative zero, 195 
lgr, 35 nested interrupts, support for, 252 
lqx, 33 no operation (execute), 241 
LSA. See local storage address no operation (load), 240 
LSLR values, 31 nontrap exception handling, 198 
LSLR. See Local Storage Limit Register nop, 241 
nor, 113 


nor (mnemonic), 113 

not a number (NaN), support for, 195 

not equals sign, 20 

not propagated NaN exception condition, 199 
notations used in this manual, 16 
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numbers 
extending, 195, 198 
rounding, 198 


O 


OP or OPCD, defined, 19 
opcode instruction field, 19 
operands, described, 16 
operations 

double-precision, 197 

single precision (extended-range mode), 195 
or, 102 
or (mnemonic), 102 
or across, 107 
or byte immediate, 104 
or halfword immediate, 105 
or with complement, 103 
or word immediate, 106 
OR, defined, 20 
orbi, 104 
orc, 103 
ordering of transactions, 253 
organization of manual, 13 
orhi, 105 
ori, 106 
orx, 107 
overflow (OVF) exception condition, 196, 198 
OVF. See overflow (OVF) exception condition 


P 


positive zero, 195 

primitives, synchronization, 254 
program counter, 18 
propagation of NaNs, 197 
purpose of manual, 13, 23 


Q 


quadwords 
bit and byte numbering, 27 


R 


RA field, described, 19 
RB field, described, 19 
RC field, described, 19 
rchent, 249 

rdch, 248 

read channel, 248 

read channel count, 249 
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reference documents, 13 
reference materials, 13 
register layout of data types, 28 
register transfer language (RTL) instruction definitions, 18 
registers 
conventions for, 17 
data layout in, 28 
reordering of channel reads and channel writes, 257 
reordering SPU local storage access, 254 
RepLeftBit(x,y), 18 
representation 
of data, 25 
of zeros, 195 
reserved fields, 19, 20 
RI10 instruction format, 29 
RI16 instruction format, 29 
RI18 instruction format, 29 
RI7 instruction format, 28 
rot, 129 
rotate and mask algebraic halfword, 145 
rotate and mask algebraic halfword immediate, 146 
rotate and mask algebraic word, 147 
rotate and mask algebraic word immediate, 148 
rotate and mask halfword, 136 
rotate and mask halfword immediate, 137 
rotate and mask quadword by bits, 143 
rotate and mask quadword by bits immediate, 144 
rotate and mask quadword by bytes, 140 
rotate and mask quadword by bytes immediate, 141 
rotate and mask quadword bytes from bit shift count, 142 
rotate and mask word, 138 
rotate and mask word immediate, 139 
rotate halfword, 127 
rotate halfword immediate, 128 
rotate instructions, 117—148 
rotate quadword by bits, 134 
rotate quadword by bits immediate, 135 
rotate quadword by bytes, 131 
rotate quadword by bytes from bit shift count, 133 
rotate quadword by bytes immediate, 132 
rotate word, 129 
rotate word immediate, 130 
roth, 127 
rothi, 128 
rothm, 136 
rothmi, 137 
roti, 130 
rotm, 138 
rotma, 147 
rotmah, 145 
rotmahi, 146 
rotmai, 148 
rotmi, 139 
rotqbi, 134 
rotqbii, 135 
rotqby, 131 
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rotqbybi, 133 

rotqbyi, 132 

rotqmbi, 143 

rotqmbii, 144 

rotqmby, 140 

rotqmbybi, 142 

rotqmbyi, 141 

rounding control, slice 0, 200, 269 
rounding control, slice 1, 200, 269 
rounding mode, support for, 196 
rounding modes, independent control of, 197 
rounding numbers, 198 

RR instruction format, 28 

RRR instruction format, 28 

RT field, defined, 19 

RTL instruction definitions, 18 


S 


selb, 115 

select bits, 115 

self-modifying code, 256 

setting execution state, 258 

sf, 64 

sfh, 62 

sfhi, 63 

sfi, 65 

sfx, 69 

shift instructions, 117—148 

shift left halfword, 118 

shift left halfword immediate, 119 

shift left quadword by bits, 122 

shift left quadword by bits immediate, 123 

shift left quadword by bytes, 124 

shift left quadword by bytes from bit shift count, 126 

shift left quadword by bytes immediate, 125 

shift left word, 120 

shift left word immediate, 121 

shl, 120 

shlh, 118 

shlhi, 119 

shli, 121 

shlqbi, 122 

shlqbii, 123 

shlqby, 124 

shlqbybi, 126 

shiqbyi, 125 

shufb, 116 

shufb instruction, 265 

shuffle bytes, 116 

signed comparison signs, 20 

signed multiplication symbol, 20 

single precision (extended-range mode) operations, 195 

single-precision (extended-range mode) minimum and 
maximum values, 195, 198 
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single-precision format, converting, 198 
special-purpose register, move from, 244 
special-purpose register, move to, 245 
speculation 
of channel reads and channel writes, 257 
of SPU local storage access, 253, 254 
SPU architecture, 23, 25 
SPU floating point versus IEEE standard floating point, 
195 
SPU internal execution state, 254 
SPU interrupt facility, 251 
SPU interrupt facility channels, 252 
SPU interrupt handler, 251 
SPU ISA, reference materials for, 13 
SPU loads and stores and dsync and sync instructions, 
258 
SPU local storage access, 253 
caching, 254, 256 
reordering, 254 
speculation of, 253, 254 
SPU RdSHRRO channel and interrupt facility, 252 
SPU WrSRRO channel and interrupt facility, 252 
SRRO Register and SPU interrupt handler, 251 
stop, 238 
stop and signal, 238 
stop and signal with dependencies, 239 
stopd, 239 
store quadword (a-form), 38 
store quadword (d-form), 36 
store quadword (x-form), 37 
store quadword instruction relative (a-form), 39 
store transaction, 253 
siga, 38 
stad, 36 
stqr, 39 
stqx, 37 
subtract from extended, 69 
subtract from halfword, 62 
subtract from halfword immediate, 63 
subtract from word, 64 
subtract from word immediate, 65 
subtraction, two's complement, 20 
sum bytes into halfwords, 93 
sumb, 93 
support for denorms (DENORM,, infinity (INF), and not a 
number (NaN), 196 
sync, 242 
sync instruction, 254, 255 
caching SPU local storage access, 256 
self-modifying code and, 256 
SPU loads and stores and, 258 
sync.c instruction, 254, 255 
synchronization, 253 
synchronization instructions, 255 
synchronization primitives, 254, 255 
synchronization, ordering and, 253 
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synchronizing, 242 
multiple accesses to local storage, 256 
through channel interface, 258 
through local storage, 257 
synchronizing data, 243 
systems with multiple accesses to local storage, 253 


T 


temporary RTL names, 18 
trademarks, 2 

transaction ordering, 253 
transaction synchronization, 253 
truncation, support for, 196 

two’s complement addition, 20 
two’s complement subtraction, 20 


U, V, W 


u, defined, 20 
unary minus, 20 
unary NOT operator, 20 
underflow (UNF) exception condition, 196, 199 
UNF. See underflow (UNF) exception condition 
unsigned comparison signs, 20 
unsigned multiplication symbol, 20 
word insertion, 266 
words 
bit and byte numbering, 26 
wrch, 250 
write channel, 250 


X 


xor, 108 
xorbi, 109 
xorhi, 110 
xori, 111 
xsbh, 94 
xshw, 95 
xswd, 96 


Z 


zeros, representation of, 195 
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